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SOME VALIDITY CRITERIA FOR STATISTICAL INFERENCES 


By Rosert J. BUEHLER 
Towa State University 


1. Introduction. This paper is concerned with the ways in which existing 
statistical: theories specify the degree of uncertainty of an inference. For the 
sake of graphic presentation, problems of inference are described in terms of a 
game between two players—one who makes the inferences and another who 
questions their validity. Such a model suggests a number of criteria of validity 
which depend entirely on classical probability calculations. I anticipate that 
arguments may be advanced for not regarding statistical inference as a game; 
it is hoped that the cogency of such arguments will not prevent the present model 
from providing some new insight into the problems here considered. 

Many, though not all, problems of inference lead to assertions of the type, 
“The probability that A is true is equal to a,” or, “P(A) = a.” One may ask 
whether the person making this assertion should be willing to bet that A is true, 
risking an amount @ to win 1 — a, and should be equally willing to bet that A 
is false, risking 1 — @ to win a, against an opponent who has exactly the same 
information as he and who is allowed to choose either side of the wager. The 
affirmative answer will not be defended here, but its consequences will be 
examined. 

The game viewpoint is related to, but not identical with, the ideas of von 
Mises, who has advanced as a postulate ‘‘the impossibility of a gambling system” 
in his definition of probability (see for example [19], p. 15). It has generally been 
recognized. that modern theories of inference, which avoid the assumption of 
prior distributions of the parameters, should not have the same interpretation 
as the classical Bayes-Laplace theory based on prior distributions. The present 
paper attempts to show the sense in which one pays for weakening the classical 
assumptions by losing the von Mises postulate for the inferences ““P(A) = a.” 

Sections 2 to 5 are devoted to the theory of confidence intervals; in Sections 
6 to 9 the ideas are generalized to include other statistical problems. The reader 
is warned not to expect to find any new problems solved in this paper, for at the 
present stage of development the theory gives at best new ways of looking at 
existing solutions. 


2. A model for studying interval estimation. In the spirit of the introductory 
section the problem of interval estimation is here studied in terms of a game 
between two players. The players have equal knowledge about fixed conditions 
K (for “known’’) of a random experiment, for example, knowledge that n 
values are observed from a normal population having unit variance. Unknown 
conditions of the experiment, for example the value of the population mean pz, 
are conveniently referred to as the “state of nature’ U (for “unknown’’). The 
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first player, Peter, has the familiar task of setting confidence intervals. It is re- 
quired that he formulate a rule R which determines the interval as a function 
of the observations. Then on the basis of the observations he makes a probability 
assertion 


“P(A) = a”. 


The quotation marks are used to identify an expression as Peter’s assertion and 
to warn that it may not have validity in a direct probability or frequency sense, 
for that restriction is not imposed. The assertions may have validity as ‘‘con- 
fidence probabilities” or “fiducial probabilities,” these being special cases. In the 
unit-normal example one would commonly take a to be 0.95 and A to be 


E — 1.96//n S wp S F + 1.96/V/n. 


In order that the second player, Paul, have information equal to Peter’s, it 
is required that he have knowledge of Peter’s rule R as well as of the experi- 
mental conditions and observations. Paul adopts a strategy S based on R and 
on the experimental conditions, and consisting in the specification of two sub- 
sets C* and C” of the observation space such that 


ct 
(~ Paul bets that 


for observations in ‘ 
A is j true, risking {e to win {1 fis 
\ false, l—a a. 
It is not required that a bet must always be made; thus C* and C” need not be 
exhaustive. To determine the winner of each bet, we postulate the existence of a 
referee who knows the true state of nature. 

2.1. The criterion of weak exactness. If the model is adequately specified, one 
should in principle be able to calculate the expected gain to Paul. For any fixed 
experimental conditions K the expected gain would be a function of (i) the 
state of nature U, (ii) Peter’s rule R, and (iii) Paul’s strategy S. Different 
criteria for the sensibility of Peter’s rule might be put forward in terms of this 
expected gain. For example, I propose the following. Suppose Paul’s strategy is 
to bet consistently that A is false, regardless of the observations. Then if Paul’s 
expected gain is zero for all U, Peter’s rule R will be defined to be weakly exact.’ 

Weak exactness is (at least from the point of view of some theories of prob- 
ability) essentially equivalent to writing P(A) = a without quotation marks, 
the definition above being preferred on the grounds that it is less subject to 
misinterpretation. It will be noted that weak exactness is a propery possessed 
by Neyman’s confidence intervals (see for example [17]), but not necessarily by 
the fiducial counterparts, as is known. 

2.2. Relevant and semirelevant subsets. In this paper the ultimate calculation of 
the expected gain is actually made only once (in Section 4.2). The emphasis lies 











‘Strongly exact is defined in Section 6.1. 
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more in studying Paul’s initial search for a strategy having a guaranteed winning 
percentage. If we call P(A |C) — a the lias of C, then Paul’s problem is to 
find subsets C whose bias has the same sign for all U. These will be called semi- 
relevant subsets induced by R. If moreover the bias is bounded away from zero, 
they will be called relevant. That is, if « > 0 is independent of U, then C is called 


if P(A|C) >a 
orif P(A|C) <a 


if P(A|C) Zat+e 
orif P(A|C)Sa- 


semirelevant { for all U, 


for all U. 


relevant { 


The phrase “induced by R” is crucial; the defined properties are necessarily 
relative to the rule R, which enters the defining equations through A. The defi- 
nitions are inspired largely by writings of Fisher, of which the following quota- 
tions from [11] are typical: 

p. 32. “‘---no such subset can be recognized.” 

p. 33. “‘---inability to discriminate any of the different subaggregates having 

different limiting frequency ratios.” 

p. 57. “---every subset to which it belongs, and which is characterized by a 

different fraction must be unrecognizable.” 
More recently ([{12], p. 23) Fisher uses the words “relevant’’ and “irrelevant.” 
For example: 
“The subset of throws made on a Tuesday is easily recognizable, it has, how- 
ever the same probability as the whole set and is therefore irrelevant.” 

It would seem that any subset C of the observation space might be called 
“recognizable” in Fisher’s sense since it is determined by known observations. 
The word “relevant”’ has been introduced here in a sense intended to be close to 
Fisher’s, but there seems to be at least a small difference arising from the de- 
pendence on the rule R. If a need for distinct terms should arise, “induced relevant 
subset”’ might be substituted for ‘relevant subset”’ as used here. 

In typical interval estimation problems the bias of most subsets C will not 
have the same sign for all U. In particular if C contains only a single point of the 
observation space, then P(A | C) ordinarily will be either zero or unity, de- 
pending on U. This corresponds to the statement, sometimes seen in textbooks, 
that the probability that the true value of the parameter lies within a confidence 
interval is either zero or unity after the interval has been constructed. 

2.3. Relevance of unions and complements. It will be convenient later to refer 
to the following elementary results. 

Lemma 1. If subsets C, and C, are disjoint, (semi)relevant, and positively |nega- 
tively] biased, and if the union C, + C2 has nonzero probability for all U, then the 
union is (semt)relevant and positively |negatively| biased. 

We may remark that for nondisjoint subsets neither the union nor the inter- 
section need have special properties of relevance deriving from the components. 

Lemma 2. Let C’ denote the complement of C. If (i) P(A) = @ (essentially 
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weak exactness) ; (ii) P(C) and P(C’) are nonzero for all U; (iii) P(C) > b > O 
(b is a constant independent of U); and (iv) C is relevant; then C’ is relevant with 
bias opposite to C. 

Lemma 3. If in Lemma 2 (iii) is not assumed, then C’ is semirelevant with bias 
opposite to C. 

Lemma 4. If in Lemma 2 (iii) is not assumed and if (iv) is replaced by “C is 
semirelevant,”’ then C’ is semirelevant with bias opposite to C. 


3. Examples of relevant subsets in interval estimation problems. We now 
give an assortment of six examples of relevant subsets induced by systems of 
confidence intervals. 

3.1. An éxample of intervals based on an insufficient statistic. This first example, 
although relatively simple, illustrates a number of interesting points. Let a 
sample of size n = 2 be drawn from a normal population having unknown mean 
»# and unit variance. A confidence interval based on the first observation x; , but 
ignoring the second zz , corresponds to the weakly exact probability assertion 


“P(a, — 196 S wp Sm + 1.96) = 0.95”. 


Paul might logically begin his search for relevant subsets by comparing this 
questionable assertion with the standard one based on the sufficient statistic 
& = 4(x%; + 2). The comparison shows that when z, = 22 the intervals are 
correctly placed but unduly long, and when |x; — 2;| is large they are “badly 
placed.” A clue is thus furnished which suggests conditioning on subsets defined 
in terms of the statistic 6 = z, — x2 . If C denotes any such subset and A denotes 
lz; — »| S 1.96, then the conditional probability is 


P(A|C) = P(AC)/P(C) = [fsa ay / [fsa dy 


where f dx dy is the density (1/2) exp {—3(a1 — »)* — 4(a2 — u)*} dx dy. On 
setting y1 = 1 — pw, Y2 = X2 — yw, it is seen that both numerator and denominator 
are independent of u. Thus P(A | C) is independent of », and all subsets C de- 
fined in terms of 6 will be relevant save for exceptional cases for which P(A | C) = 
0.95. In particular the sets (of zero probability) for which 6 = 4) can be seen 
by simple calculation to have negative, zero, positive bias respectively for 
|do| >, =, <1.593. For 6) = 0, P(A | C) achieves its maximum value of 0.997 
(= standard normal area within +1.96 +/2); as | 8)| tends to infinity, P(A | C) 
tends to zero. Consider now the three subsets 


Cr: O0<6 <1; C2: —-1 <6 <0; C2: complement of C2 . 


C may be regarded as being made up of a continuum of positively biased relevant 
subsets of zero probability, and by a continuous generalization of Lemma | it 
follows that C; has positive bias. By the same reasoning C2 is positively biased; 
and by Lemma 2, C> is negatively biased. One interesting observation is that 
C; is a subset of C2 ; thus a positively biased subset may sometimes be contained 
in a negatively biased one. Furthermore it is possible for a particular observation 
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to belong simultaneously to two different subsets having opposite bias. Thus 
arises the basic question (which will not be resolved in this paper) of the ap- 
propriate subset to which any particular observation should be referred if it is 
not to be referred to the universal set of all observations. 

3.2. An example involving shortest average length. Our second example is inspired 
by Cox [4], who treats the testing situation and the criterion of maximum power, 
whereas we treat the estimation situation and the criterion of minimum expected 
length. Suppose two populations are known to have standard deviations o; = 1 
and o. = 2. A random choice between the two populations is made and one 
normal random variate is observed. If Peter knows which population was sampled 
he may treat each population separately in the usual way, asserting 


“P(x — 1.960; S wu S x + 1.960;) = 0.95” 


in which both z and o; represent observed values. On the other hand, if he wished 
to minimize the average length of the intervals, he would do best to increase 
the error rate for the second population to about seven per cent and compensate 
by decreasing the error rate for the first population to about three per cent, 
thus maintaining a five per cent average. The first solution may be called a 
“conditional” one (in the sense of conditional probability) inasmuch as P(A) = a 
is valid in the frequency sense, with the relative frequency a prevailing sepa- 
rately in the two subsets defined by the value of c. The second solution is ‘‘un- 
conditional” in that P(A) = a is valid in the frequency sense only when related 
to the sequence of all observations, ignoring the observed value of ¢. Both 
solutions are weakly exact. The two subsets defined by the value of o are con- 
spicuously relevant for the unconditional solution; thus the criteria of shortest 
average length and nonexistence of relevant subsets cannot both be met. I 
myself would prefer the conditional solution here. 

3.3. An example of relevant subsets defined by an ancillary statistic. The follow- 
ing example uses some distributional results given by Fisher ([{11], pp. 163- 
165), to illustrate how relevant subsets may be defined by means of an ancillary 
statistic. Consider a sample of size n from the bivariate density 


e *-~"* dex dy, 0<2,y,0< ~, 


and define statistics 7’ (the maximum likelihood estimator of @) and V (an 
ancillary statistic, according to Fisher) by 


rea Ve 2s” Taw, Vl wee. 
Then 7 and V have the joint density 


yrs ¥ é\\ dT 
[(m — 1p exp{-V(7 * a} 7, 


and the marginal and conditional densities of 7’ are respectively 


,. aw T , 0\\aT 
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where K and K(V) are independent of 7’. The two distributions may be used to 
give respectively unconditional and conditional systems of confidence intervals 


for 6. To use the former, put 8 = 7'/@ and determine constants 6; and 62 such 
that 


Bo a 
[ eras eye as=af ea + sy as 
This leads to the assertion 


“P(A) = P(b: S 7/0 S B,) = P(T/B, S 6 S T/fi:) = @”. 


But conditionally upon the value of the ancillary V the probability is 


pay) =f e{-v (742) / [ exp{ - v(7+ a) o 
= fier{-v(e+ )}/Lr{-v e+ )}$ 


The last expression depends on the ancillary V but is independent of the pa- 
rameter 6; it will equal a only for a particular intermediate value of V, and for 
all other values relevant subsets will be defined. 

3.4. Another example involving an ancillary statistic. Turning to Fisher ([9], 
and [11], p. 134) for another example involving an ancillary statistic, we let xz 
and y have a circular normal distribution with unit variance and with mean on a 
circle of known radius FR so that the density is 


(1/2m) exp {—4(x — Reos 6)’ — 4(y — Rsin @)} dx dy 


where @ is to be estimated. If a single observation (z, y) is expressed in polar 


coordinates by 
a@=27+y xz = acos6 


6 = tan’ y/z y = asiné 


then 6 is the maximum likelihood estimator of 6, and a is ancillary. One easily 
obtains the joint density 


(a/2m) exp {—4(a’ + R’ — 2aR cos (6 — 6))} da dé. 
An unconditional system of confidence intervals is given by 
“P(6—ys056+y) = 
where a is the fraction of the unit circular normal distribution inside a wedge 
of angle 2y, the maximum density being centered in the wedge at a distance R 
from the vertex. To find the probability conditioned on the ancillary, we find 
the marginal distribution of a to be expressible in terms of the Bessel function 
Io by 
ada 2 aad a a 
> exp {—3(a + R’)} exp {aR cos (6 — 6)} dé 
2r 


= aexp {—3(a@ + R’)}Io(aR) da. 
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Thus the conditional density of 6 given a is 


exp {aR cos (6 — @)} 
2r1,(aR) 


and the conditional probability for a given value of a is 


dB, 


P(6—7 5656+ y/\/a) = 


1 m aRcosé 

seinem) 1," 

As in the previous example, the conditional probability depends on the ancillary 
and is independent of the parameter. Here it approaches unity as a tends to 
infinity and approaches y/ as a tends to zero. Thus subsets constructed from 
observations lying near the origin will be negatively biased; the variability of 6 
for these considered separately is larger than the variability of 6 for all ob- 
servations collectively. 

3.5. An example involving the Behrens-Fisher problem. Behrens’ hypothesis 
states that the means yy; , uw: of two normal populations differ by 6( = 0, usually), 
no assumption of the equality of variances oj , 3 being made. We consider the 
test devised by Welch [21] tabulated by Aspin [1] and appearing as Table 11 in 
the Biometrika Tables of Pearson and Hartley [18]. This test rejects the null 
hypothesis Ho: w. — wu. = 6 when |%, — # — 4] > v(si/m, + 8:/nz)"”? where 
n, &, s° denote size, mean, and variance of the samples and v (the analog of 
Student’s ¢) is a tabulated function of nm, , m2, 8} , 82 , and the significance level. 
A calculation of Fisher [10] shows that for fixed s./s, the conditional probability 
P (reject | Ho , s2/s,;) can be expressed as a unique function of the ratio 03/01 ; 
and that when nm, = nm. = 7, s = s, and the significance level is 0.1, then 
P (reject | Ho, 8; = 82) achieves a minimum value’ of 0.108 when o3/ei = 1. 
Now if the tabulated test is translated into a system of confidence intervals, one 
obtains the probability assertion 


“P(A) = P{\(# — #2) — (ui — ma)| S v(si/mi + 83/n2)""} = @” 
for which Fisher’s calculation shows that when n; = nm. = 7, a = 0.9, 
P(A|C) Sa-—e forall fe» Me, Gis C8 


where « = 0.008 and C is the subset 5, = s. 

Thus the tabulated solution induces a negatively biased relevant subset. It is 
noteworthy that in this example P(A | C’) depends on the ratio 03/0; whereas 
in all four preceding examples P(A | C) was independent of the parameters. 
One might well wish to distinguish between these two types of relevance, al- 
though separate names have not been advanced here. 

In [10] Fisher seems to imply that subsets defined by fixed values of s/s; 
are uniquely appropriate reference sets for inferences about Behrens’ hypothesis; 
but like Bartlett [2] and Welch [22], I do not find his reasons to be compelling. 
Fisher does show that the subset s; = s is not relevant (in the present technical 


2 Fisher gives a graph but no table. The value 0.108 is obtained from Welch (22). 
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sense) for Behrens’ solution, but he does not consider the possible relevance of 
other subsets. Wallace [20] has since shown that no such relevant subsets exist. 
3.6. An example in which an individual point of the sample space is a relevant 
subset. As a rule the property of being relevant belongs to a set of possible ob- 
servations and not to individual points of the observation space. But in rare 
cases individual points may be relevant, as can be illustrated by a familiar com- 
ponent-of-variance problem. Let 
=pt+a; t+ 6; (¢=1,---,r; j=1,°--,8) 


in which » is an unknown parameter, a; and e;; are normal and independently 
distributed with zero means and variances o3 and o°. Using the dot notation to 
indicate an average over the missing subscript, the sums of squares 


Q= LD (ys — yi)? and Q@ = DD (ys. — y.-)” 
are known to have independent distributions given in terms of chi-square by 
ox) and (0 + 8os)xr1- 


Thus the quantity 
(° + 8a) fo Q: os xi/fi 


of. Qs Xhe/fe 


has the F distribution with f; = r(s — 1) and fe = r — 1 degrees of freedom. 
If lower and upper percentage points 7, and F; are chosen so that 


P(Fi SF SF) = 


then on rearranging the inequalities one finds that the probability is a that 


Lf fiQ p, -i}s%s} (ier, -1}. 





8 8 \ fo 2 Qe f2Qi 
If the ratio of mean squares is larger than the upper percentage point, 
Qi/fi 
>F 
Q/fr~ —* 


as always can happen with at least some small probability, then the confidence 
interval includes only negative values. The probability of covering the true ratio 
o2/o° conditionally on any such observation is zero; thus such individual points 
constitute negatively biased relevant subsets. 

Now let C denote the collection of all such points, that is, all observations for 
which the last inequality is satisfied. Then C is relevant and negatively biased. 
Is increased confidence thereby justified for observations in the complementary 
set C’? That C’ is semirelevant follows from Lemma 3. Using P(A | C) = 0 one 
obtains 


P(A|C’) = a/P(C’) = a + aP(C)/P(C’) 
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so that the bias is aP(C)/P(C’). Since P(C) tends to zero as o’/o% tends to 
zero, there is no positive lower bound for the bias. Thus the complementary set 
C’ is not relevant. Loosely summarizing these results: within C one’s confidence 
is necessarily zero; within C’, a is the greatest lower bound for the confidence 
over the possible states of nature. 


4. Examples of semirelevant subsets. From the following examples it will be 
seen that semirelevant subsets are more easily found than relevant ones and that 
the requirement that none should exist is a very severe restriction indeed. 

4.1. A nonparametric assertion about the median. Let g(t) be any continuous 
density which is positive for —« < t< © and which has median equal to zero. 
If one observation z is taken from the density g(z — @) (@ = median), then the 
assertion 


“P(A) = P(x <0 < w) = 3” 


is weakly exact. Consider the subset C defined by x > 0. The conditional proba- 
bility P(A |C) = P(x < 6|x > 0) = P(0 < z < 86) is zero for negative @ 
while for positive @ one has 


6 0 
PO<2<0)=[ g(x — 0) dz = [ oft) at 


=3-[ ac} 


for all0 < 6 < o. Thus C is semirelevant, negatively biased. The last integral, 
which gives the bias as a function of 6, is seen to tend to zero as @ tends to in- 
finity; thus C is semirelevant only—not relevant. 

4.2. Student’s t. Let zx be normally distributed with unknown mean 
p(—« <u < @) and unknown variance o’ (0 < o < «). If # ands’ denote 
the sample mean and variance, then the conventional confidence or fiducial 
interval estimate of u corresponds to the assertion 

“P(é—ks Su S%+ ks) = a” 


where k = tan-1/+/n and ta,n-1 is the appropriate percentage point of Student’s 
t. We shall first show that for any a > 0, the subset s < a is semirelevant, 
negatively biased. Stated in another way, for any y, o’, “long” intervals cover 
the true value more frequently than “short” intervals, and this fact holds for 
any critical length which is used to distinguish “long” intervals from-“‘short”’ 
ones. 

Denote by A the event | — yu! S ks, by C and C’ the subset s < a and its 
complement s = a, by o ‘f(s/c) ds and g(@ — yu) dx the independent densities 
of s and Z. Then 


P(C) = [ o¥s/e) de = Na/e), say, 





854 ROBERT J. BUEHLER 


and 


u+ke 
P(AC) = [ oVs/0) [ g(é — pw) dz ds. 
0 u—ke 
If we put 
tke ke 
tt a as -= 
G(s) a g(é — p) dz . g(z) dz, 


then by a mean value theorem 
P(AC) = [ a f(s/c)G(s) ds = G(s) [ a 'f(s/c) ds = G(s)d(a/c) 
where 0 < s% < a. This gives ' 
P(A |C) = P(AC)/P(C) = G(s). 
By a similar argument 
P(A|C’) = G(s) 


where a < 86 < «,. But G is an increasing function so that % <a < 8» implies 
G(s) < G(so). The values of so and s depend on a and g, but for any particular 
a, a, weak exactness implies that C and C’ musi have opposite bias. Thus for 
all a, o, u, 


G(%) = P(A|C) <a < P(A|C’) = G(s) 


so that C and C’ are semirelevant. 

To illustrate the situation more clearly, we consider the simple case n = 2, 
a = }, for which s* = }3(x, — 2)’ and the assertion is that the mean has an 
even chance of lying between the two observations: 


“P(A) = P(2min S & S Tmax) = }”. 


For the subset C’: s 2 a, the conditional probability is 
P(A |C’) = 3 + 36(a/c) 


where ¢(a/c) is the standard normal probability between —a/o and +a/c. Thus 
the bias, 4¢(a/c), is always positive, increasing from 0 to 3 as a increases from 
0 to ~. If Paul adopts the strategy: bet even odds that A is true when s 2 a 
and bet even odds that A is false when s < a, then the probability that Paul 
wins is 


P =}+9(1- 4), ¢ = o(a/c), 


which lies in the range 1/2 < P Ss 3/4, and Paul’s expected gain on these bets 
of 1/2 unit is 


G = ¢(1 — ¢) 


which always lies in the range 0 < G < 1/4. The maxima of P and G are at- 





VALIDITY CRITERIA FOR INFERENCES 
tained when @ = 1/2, that is, when 


a = 0.670, approximately. 


Thus the optimum value of a is proportional to 7; and for dimensional reasons 
this is true generally, the proportionality factor depending on n and a. 

In an idealized model Paul will have no prior information concerning o and 
thus will have no basis for a choice of a in the range from zero to infinity, but in 
practically any applied problem some prior knowledge of « will be available. It 
is noteworthy that Fisher has specifically stated that fiducial inference is valid 
only in the absence of prior knowledge ({11], p. 51). In contrast, Neyman appears 
not to have taken a stand on the applicability of confidence interval theory in 
the presence of prior information. I believe that the above calculations tend to 
justify Fisher’s restriction. Now the assumption that a prior distribution is 
known and the assumption that nothing is known are two boundaries of a vast 
intermediate area in which there is partial prior information. We have seen 
how Paul can find a crude stategy in this middle area against Peter’s conventional 
solution. But what one would really like to know is how Peter’s rule might be 


altered to use partial prior knowledge. That is a much more subtle and difficult 
question. 


5. An example of nonexistence of relevant subsets. Let g(t) be a continuous 
density which is nonzero for all ¢, and let f(z, 0) = g(x — @) so that @ is a lo- 


cation parameter. From a sample of one value of z, @ may be estimated by the 
assertion 


“P(A) = P(-th St2-O05h) = P(x-hSO052+4h) = a” 


where ?#, and t are chosen so that 


f Kh @ wa. 


We wish to show that neither of the inequalities 


(1) P(A|\C) Zat+e or P(A|\C) Sa-e (for all 6) 


can hold for any Lebesgue measurable set C of values of z having finite or infinite, 
but not zero, Lebesgue measure (one is prevented from treating sets of measure 
zero by the nonuniqueness of the definition of conditional probability; see for 
example [13], p. 12). Let a(@), b(@), A(R), B(R) be defined by 


a(é)=P(C), A(R) = [, a(6) dé, 


R 
b(@) = P(AC), BCR) - | b(@) de. 


Then P(A|C) = 6(6@)/a(@); and substituting in (1), multiplying by a(@), 
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integrating over —R < 0 < R, and dividing by A(R) gives 


B(R) B(R) 
Am) =*t 


€ or Sa-—e (forall R). 


A(R) 
It is shown in the appendix that subject to weak conditions on g(t), 
B(R)/A(R) — a as R — ~~, and thus the desired result is established by 
contradiction. Some sweeping generalizations of this result have been found by 
Wallace [20]. 


6. Generalization of the model. It may first be noted that the theory of 
tolerance as well as confidence intervals may be treated simply by taking A to 
be the proposition that at least a certain proportion of the population lies be- 
tween stated limits. 

A useful generalization consists in representing the outcome of the random 
experiment by a pair of random variables (x, y) of which z is known and y is 
unknown to the players, both being known to the referee. The theory of pre- 
diction intervals is then treated by taking x to be “‘past”’ observations and y to 
be “future” observations to be predicted (it is immaterial that the “future” 
observations y have already been observed by the referee so long as they are 
unknown to the two players). The proposition A is then a statement depending 
on «x concerning the future observations; a notable distinction is that A does not 
concern the unknown state of nature U. 

In Bayesian interval estimation (Cramér [6], p. 508, Neyman [17], p. 162) 
y plays the role of the unknown parameter having a known prior distribution. 
Bayesian problems generally have the distinguishing feature that the state of 
nature is assumed known so that U does not appear. As in prediction interval 
theory, the proposition A gives a relation between z and y, the known and un- 
known observations. 

A further possible generalization which might have some interest, although 
it seems not to be required for existing theories, consists in allowing the con- 
fidence coefficient a to depend on the known observations: a = a(z). That is, 
the rule R for probability assertions may specify not only how the proposition 
A is to depend on z but also how the asserted probability level is to depend on z. 
The definition of relevant subsets then requires some repair; the logical extension 
consists in considering only subsets C of values of x for which a(x) takes a fixed 
value. An example of variable a is given in Section 9. 

6.1. The criterion of strong exactness. It has been noted that the expected gain 
to Paul is a function of the unknown state of nature U, Peter’s rule R, and Paul’s 
strategy S; thus it may be denoted by G(U, R, S). 

Definition. A rule Ry will be called strongly exact if 


G(U, Ry, S) = 0 identically for all U, S. 


In other words: Whatever the true state of nature and whatever strategy Paul 


may use, the expected gain to Paul is zero. It is essentially equivalent to write 
P(A|C) = efor all U, C. 
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It appears to be impossible to satisfy the very stringent condition of strong 
exactness except in rather special cases, e.g., in Bayesian estimation where the 
condition ‘all U” is in fact no requirement at all since U is not variable. Thus 
strong exactness is not so much a practical requirement as a goal toward which 
one might strive even though it cannot actually be reached. 


7. Remarks on Bayesian interval estimation. In a typical Bayesian situation 
an experiment consists in obtaining a random value of the “parameter” from a 
known distribution w(@) and subsequently observing values x ,--- , 7, from 
a distribution f(z; 6) which is known but for the value of 6. Thus in the model 
of Section 6, x represents x, , --- , Z, and y represents 0. The conditional dis- 


tribution of @ given 2, --+ , %n is 
Rotel «yay ae w(8) {f(a 50) --- f(z, 58)} 
[ 60) {921 0) ++ fas 50)} a 
A Bayesian estimate of @ is given by the assertion 
“P(ky S @ Ss kz) = a” 


where k, and kz are any two numbers depending on the observations by 


ke 
[0 no) 21, +++ , 25) do = a 
ky 


Apart from the arbitrary allocation of the probability 1 — a between the two 
tails of the distribution, k; and kz are unique functions of the observations. It 
may further be noted that the interval (k; , k.) based on any particular observa- 
tion (21,--+*, 2%.) has its own validity without reference to intervals which 
might be defined for other observations; this is in contrast to confidence interval 
theory in which any particular interval is meaningful only when referred to a 
system of intervals defined for all possible samples. A closely connected fact 
is that the above Bayesian solution gives a rule for assertions which is strongly 
exact. 

Two variations of the above are found in the literature. Cramér ([6], p. 508) 
replaces h(@|21,---, 2a) by 


n(o| 6) = —200)0005 0) 
[w00)9(0 50) aa 


where 6 is an estimator of 6 and g is the density of 6. Three remarks can be made: 
(i) If 6 is sufficient for @ and if the allocation of the probability 1 — a to the 
two tails is fixed, then this solution will not differ from the preceding one. (ii) 
If 4 is not sufficient and if the sample (x; ,--- , tn) is known to the players, 
then relevant subsets will exist and the solution is weakly exact but not strongly 
exact. (iii) If 6 is not sufficient, but the players have knowledge only of 6 and 
not of (21, °°, 2), then the last solution is strongly exact. 
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A second variation is given by Neyman’s ‘modernized Bayes’ estimating 
intervals” (({17], pp. 165-181) in which the expected length of the intervals is 
shortened by requiring that the asserted probability a refer only to the relative 
frequency in the sequence of all experiments, not separately to sequences of 
fixed (2, , --- , 2). Thus in the “modernized Bayes’ solution” we have a clear- 
cut example of a rule that is weakly but not strongly exact. This situation is 
quite similar to the Cox example of Section 3.2. 


8. A familiar prediction example. To show how a prediction problem fits into 
the general scheme and to point out some analogies between prediction and 
confidence intervals, we consider a familiar prediction example. Let z have a 
continuous density of unspecified form, let x; and 22. represent two known ob- 
servations, and let z; represent an unknown or future observation. Then since 


all six of the permutations of x, < 2. < 2; are equally likely, the probability 
assertion 


“P( Zin < X3 < Imax) = 3” 


is weakly exact. Some years ago this particular example figured in a dispute 
between Fisher [7], [8] and Jeffreys [15], [16]. Without claiming to understand 
the subtleties of the dispute we note that Fisher [7] points out that “for any 
particular population the probability will generally be larger when the first 
two observations are far apart than when they are near together.” From Fisher’s 
remark it follows that just as in the Student’s ¢ example of Section 4.2, long 
intervals are valid more often than short ones, and the subsets rmax — Zmin S @ 
or 2 a are semirelevant for any a > 0. Fisher goes on to say that “the fallacy 
of Jeffreys’ argument consists just in assuming that the probability shall be 1/3, 
independently of the distance apart of the first two observations.” In the present 
terminology we would say that Jeffreys is accused of treating an assertion as if 
it were strongly exact when in fact it is only weakly exact. 


9. An example of intersecting relevant subsets. In this section some rather 
artificial examples are constructed primarily to illustrate intersecting relevant 
subsets. The examples differ from those preceding in that the proposition A 
is made independent of the observations and the “confidence level” a is allowed 
to be random. The examples raise the question of whether interesecting relevant 
subsets might exist elsewhere, for example, in confidence interval situations. 

Let an experiment consist in drawing one ball from twelve contained in an 
urn. Let A, B, D denote attributes, and let A’, B’, D’ denote the respective 
negations. We suppose that Peter and Paul have knowledge of the contents 
of the urn as shown in Table 1, and that the referee draws one ball and announces 
whether B or B’ is observed (x = B or B’ = observation known to players). 
It is not revealed whether A or A’ is observed (y = A or A’ = unknown ob- 
servation). Peter’s probability assertion concerns the probability that the ball 
drawn has attribute A. The assertion “P(A) = 5/12” is weakly exact. It is 
objectionable on the grounds that the (only) subsets B and B’ are relevant. 
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A strongly exact assertion clearly can be made by allowing a variable ‘‘con- 


fidence level’’: 
fr if z 
SP A ) = ” 
1/3 if z f 
The complexion of the problem is changed if the players are given further 
information. Suppose the twelve balls are also classified in categories D and D’ 
and that the values in Table 2 are also known to the players. The referee an- 
nounces one of four possible results: z = BD, BD’, B’D, or B’D’. It will be seen 
that no probability assertion of the form “P(A) = a(z)” is strongly exact 
inasmuch as the values of P(A | BD), ete., are dependent on an unknown variable 
state of nature, for the two marginal 2 X 2 tables fail to specify a 
unique 2 X 2 X 2 table which would describe the contents completely. Table 3 
shows that there are six possible states of nature which may be obtained by 
putting a = 2,3 (= number of ABD) and a’ = 1, 2,3 (= number of A’BD). 
Table 1 Table 2 Table 3 
A A’ A A’ 
D\3 3/6 
D'\2 4) 6) 





15 7112, 





Intersecting relevant subsets are obtained from the weakly exact probability 
assertion, “P(A) = 5/12,” for which B and D are relevant, positively biased 
and B’ and D’ are relevant and negatively biased. BD and B’D’ are intersections 
of similarly biased subsets; BD’ and B’D are intersections of dissimilarly biased 
subsets. It is interesting to note that BD itself is not relevant, for with a = 2, 
a’ = 3 one has P(A | BD) = 2/5 < 5/12. Thus Paul has a positive expectation 
if he bets A is true when B is observed, or if he bets A is true when D is observed; 
but he may have a negative expected gain if he bets A is true only when both 
are observed! 

Is there a uniquely appropriate reference set for Peter’s assertions in the last 
example? Use of the universal set ignores information about B and D. Use of 
BD, BD’, B'D, and B’D’ seems inappropriate because the relative frequency 
of A is not known within these subsets. It would seem about equally as appropri- 
ate to use B and B’ as to use D and D’. Thus there appear to be no uniquely 
appropriate subsets. 


10. Summary and conclusions. Statistical inferences having the form, ‘The 
probability that A is true is equal to a” can be studied within the framework 
of a game between two players, one who makes such inferences (or probability 
assertions) and an opponent who questions their validity. The model suggests 
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a number of criteria of validity of such inferences; four criteria which have 
been defined and illustrated by examples are: (i) weak exactness, (ii) strong 
exactness, (iii) nonexistence of relevant subsets, and (iv) nonexistence of semi- 
relevant subsets. Definitions of these concepts, too lengthy to be repeated here, 
are given in Sections 2.1, 6.1, 2.2, and 2.2, respectively. Some general observa- 
tions are: 

(i) Weak exactness is a criterion suggested by a familiar requirement of 
Neyman’s in the theory of confidence intervals; in the general model of Section 6 
the definition is extended to apply to more general problems. 

(ii) Strong exactness is a much more severe requirement which can be satis- 
fied in classical Bayesian estimation situations, but which appears to be un- 
reasonably demanding in most non-Bayesian problems. Those who eschew the 
prior distributions of the classical theory pay for weakening the classical as- 
sumptions by losing the property of strong exactness of the inferences. To mistake 
weak exactness for strong exactness is to attribute to an inference a more desirable 
property than it actually possesses. The logical fallacy is neatly stated by Sir 
Macklin [14]: 

“Then I shall demonstrate to you, 
According to the rules of Whately, 
That what is true of all, is true 
Of each, considered separately.” 


(iii) The criterion of nonexistence of relevant subsets is largely inspired by 
some recent work of Fisher. Various examples of relevant subsets have been 
given in order to provide a better understanding of their nature. Nonexistence 
is established only for one simple case; for much more general results the reader 
is referred to Wallace [20]. 

(iv) From the examples of Section 4 it is seen that nonexistence of semirelevant 
subsets is a very severe requirement indeed. One may conjecture that fiducial 
intervals do not induce relevant subsets, but from the example of Student’s ¢ it 
is seen that the same conjecture is not true for semirelevant subsets. 

It is to be hoped that eventually there will be found some generally accepted 
notion of an “appropriate reference set” for inferences. Some readers may find 
that the examples of Sections 3.2, 3.3, and 3.4 indicate that the universal set is 
not always as appropriate as some suitably chosen subset. On this subject, 
Fisher ([11], p. 110) states: 

“Tf, therefore, any portion of the data were to allow of the recognition of 
such a subset, to which the predicand belongs, a different probability would 
be asserted using the smallest such subset recognizable.” 
Perhaps Fisher’s idea should be formulated mathematically in terms of minimal 
subsets on which probabilities are known independently of the unknown state 
of nature. That uniqueness of appropriate reference sets might be a problem is 
indicated by the example of Section 9, in which two different subdivisions of 
the observation space give different reference sets which are about equal in merit. 
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The appropriate reference set has been a subject of controversy in testing 
situations (i.e., tests of significance and tests of hypothesis) ; contingency tables 
and regression problems (Fisher [11], pp. 82-88) are old examples. Some recent 
examples can be found in Cox [5] and in Cohen [3]. It is to be noted that the 
present development has been based largely on problems of interval estimation. 
The usual translation of criteria to testing situations is of course possible in many 
cases. Thus certain testing situations have been treated implicitly in this work; 
but perhaps the reader will find that whatever force the arguments may have 
for estimation problems is diminished in the translation to testing problems. 


11. Acknowledgement. I wish to thank Prof. Wallace for giving me a draft 
of his paper prior to publication. 
APPENDIX 
Proof of the result of Section 5 


To establish the result of Section 5 it will be assumed that the cumulative 
distribution G(t) = f'. g(t) dt satisfies the mild restrictions 


0 Cy 
[ema<- a" [ a-emMac<e. 


Denote by ¢c(x) the et characteristic function of the set C. Then 


a(0) = P(C) = [ dcl2)glx — 6) dr 


+6 
b(0) = P(AC) = | dolz)g(a — 6) de 
t1+0 
Further define »(R) by 
R 
u(R) = [. oc(x) dx 
Then if u( © ) is finite, a straightforward calculation shows that A(R) — u(@) 
and B(R) — au(«) as R— © which establishes the desired contradiction. 
If u(R) — «, we may compare A(R) with 


A'(R) = bx fs oc(xz)g(x — 6) dx dé 
‘ fs $e(z) er g(x — 0) dé dx 


= ‘: do(x) dx = u(R). 


The difference A(R) — A’(R) is made up of four integrals described (ignoring 
signs) by: x (or @) in the range —R to R, and @ (or z) in the range +R to +~. 
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All four are bounded by virtue of the assumption on G. A typical calculation 
follows: 


f ac: dc(z)g(x — 0) dd dx = pi: p g(x — 0) do dr 


~L, [lao ae 


a [ee - Ra 


= i G(x’) dz’ 


Thus A(R) = w(R) + K(R) where | K(R) | is bounded as R tends to infinity. 
The expression for B(R) may be treated as follows: 


B(R) = pe a c(t + @)g(t) dt dé 


t 


= i . bolt + 6)g(t) dé dt 


- ~~ hae $(x)g(t) dx dt 


to R 
- K(R) + [ “ol _ bclzx)g(t) de dt 


= K'(R) + ap(R) 


where K’(R) is the sum of two positive and two negative terms, each term being 
an integral over a region whose area is independent of R. Thus | K’(R) | is 
bounded. Consequently the ratio 


B(R) _ au(R) + K'(R) 
A(R) u(R) + K(R) 


tends to a and R tends to infinity, and the contradiction is established. 
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CONDITIONAL CONFIDENCE LEVEL PROPERTIES! 


By Davin L. WALLACE 
University of Chicago 


1. Introduction. Some confidence region procedures have the property that, 
conditionally on the sample point lying in some subset of the sample space, the 
conditional confidence level (i.e. the conditional probability that the region covers 
the parameter) is less than the unconditional confidence level uniformly in the 
parameters. If confidence regions are interpreted as summarizing the knowledge 
of a parameter value obtained from an experiment, such behavior has been con- 
sidered undesirable, particularly when the conditioning subset is in some sense 
irrelevant to the parameter of interest. ({2], [3], [4, Chap. IV], [9].) In many of 
these references, the issue is discussed in terms of an associated test of signifi- 
cance.) Buehler [1] has formalized this behavior and studied numerous examples. 
Tukey [9] has given a somewhat different formalization and obtained a number 
of results as part of a more complex framework for statistical inference. 

In this paper, a class of conditional properties is defined that includes the 
Buehler and Tukey definitions. Sufficient conditions for a confidence procedure 
to possess various properties are obtained. The main result is that if.a level a 
confidence procedure yields, for all samples, posterior probability a for some 
prior probability distribution on the parameter space, then there are no subsets 
of the sample space, with respect to which the conditional confidence is uni- 
formly less (or greater) than a. A much more widely applicable, but slightly 
weaker, result is obtained if a sequence of prior distributions is used. The results 
apply to most of the classical confidence problems including discrete distribu- 
tion problems and nuisance parameter problems as the Behrens-Fisher problem. 

Confidence procedures for which no conditional confidence can be uniformly 
less (or greater) than and bounded away from the nominal level include the 
usual ¢, x’, F, Pitman conditional location and scale, and Behrens-Fisher pro- 
cedures. The “uniformly less’’ conclusion applies to the one-sided binomial and 
Poisson procedures. 

Definitions and terminology are given in Section two, results are stated in 
Section three and proved in Sections five and six, and examples are given in 
Section four. 


2. Notation and definitions. Let Z be a sample space, 2 a parameter space, 
and Y = Z X Q their Cartesian product. For any set C in Y, let 
C,. = {w: (2, w) e C} and C... = {z: (2, w) e C} denote the cross section sets. 
Let (Z, @, uw) and (Q, @, A) be measure spaces with o-finite measures yp, A. Let 
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p be a measurable function on Z X Q, such that for each we, p.(+) = p(+ | w) 
is a probability density on Z relative to u. The function p is called by Tukey 
[9], a spectfication. The specification p will be fixed throughout this paper, though 
other related specifications will be used. Denote by P, and E, respectively the 
probability measure and expectation determined by p. on Z. 

A density function £ on @ relative to the measure \ will be called a prior 
density. More generally, if ¢ is any nonnegative function on ©, not identically 
zero, ¢ will be called a prior quasi-density. A prior quasi-density ¢ will be said 
to be admissible with respect to the specification p, if 


h;(z) = [sorpe |w) dr(w) < @ 


for all ze Z except for a set of » measure zero. A prior quasi-density ¢ will be 
said to be admissible except on the set A with respect to the specification p where 
A is any subset of Z, if hy(z) < © for allze Z — A except for a set of » meas- 
ure zero. Every prior quasi-density for which a constant multiple is a prior 
density; is admissible with respect to any specification. 

For every prior quasi-density ¢ and every z¢Z for which 0 < Ay(z) < @, 
there is defined a density gr(+ | z) on @ relative to A: 


_ plz| w)f(w). 
g(w|z) = ear? 


(That gr is undefined for some z will not matter.) If £ or some constant multiple 
of ¢ is a prior density, g;(+ | z) is the posterior density given by Bayes theorem. 
If not, g¢(+ | z) is still a probability density on Q, but will be called here a weak 
posterior density. (Some useful simple properties of weak posterior densities are 
set forth in Section five.) If ¢ is a prior density, then h;(+) is a (marginal) density 
on Z relative to u. 

A confidence procedure is a measurable set C in the product space Z X 2 with 
the interpretative rule that to each z, the confidence set C,. = {w: (z, w) eC} 
in © is assigned. Tukey calls C an event. No restrictions concerning confidence 
level will be placed in the definition of a confidence procedure. 

A confidence procedure C is said to be level a Bayes against with respect to 
the specification p—written C is B(a, &, p)—if some constant multiple of ¢ is a 
prior density on © and if, for each ze Z for which g;(+ | z) is defined, the set 
C,. has probability a under the posterior density g;(« | z). 

A confidence procedure C is said to be level a weak Bayes against ¢ with respect 
to the specification p—written C is B* (a, ¢, p)—if ¢ is an admissible prior quasi- 
density on 2 and no multiple of ¢ is a prior density, and if, for each ze Z for 
which g;(+ | z) is defined, the set C,. has probability a under the weak posterior 
density g;(+ | z). 

A confidence procedure C is said to be lower level a weak Bayes against £ with 
respect to the specification p—written C is B**(a, [, p)—if ¢ is a prior quasi- 
density on 2 admissible except for a set A (which may be empty), such that 
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C,. = Q for all ze A and C,. has a probability of at least a under the weak pos- 
terior density gr(+ | z) for all z for which g is defined. 

Following Tukey [9], define a selection as a function k mapping Z into the 
unit interval such that E.(k) > 0 for all we.” Let 


p’(z\w) = p(z | w)k(z) 


p”” is a specification and will be called the selected (by k) specification. Denote 
by P{, ES, g{” the functions for the selected specifications corresponding to 
ap ’ E #9. 

Selection has the interpretation that in any conceptual infinite sequence of 
observation and parameter pairs, {(z,, w,);” = 1, 2, ---}, a mew sequence is 
obtained as the subsequence in which the pair (z, , w,) is retained according to 
the outcome of a chance process with retention probability k(z,). The process 
is assumed independent for each pair. If k(z) takes only the values 0 and 1 
(pure selection), the selection is according as z, does or does not belong to the 
set D = {z: k(z) = 1} and the selected specification consists of the family of 
densities p.(+) truncated to the sample subspace D. 

Define, now, a number of performance properties of a confidence procedure 
C. 

1. C has property c(a) called exact confidence a if for all we, P(C..) = a. 

2. C has property ¢(a) called lower confidence a if 


inf P.(C..) = a@. 
weQ 


3. C has property é(a) called upper confidence a if 
sup P.(C.,) = a. 
aweQ 
4. C has advance probability a if it has exact confidence a, and if, for any 
selection k for which P{(C.,) = q for all weQ, q = a. 


5. C has strong advance probability « if it has advance probability a, and if, 
for any selections k, , ke for which 


P&P(C..) < P“&?(C..) 


for all w ¢ 2, equality holds for all w «2. 
6. C has property So(a) if, for every selection k, 


a S sup Poa.) 
weQ 


7. C has property S,(a) if, for every selection k, 


inf P“(C..) < a S sup Po (C.,). 
we? we 





? The restriction on positivity seems possibly too strong, except when the positive do- 
main of p(+ | w) is the same for all w. 
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8. C has property S2(a) if, for every selection k, there exist parameter values 
@1 , w , Such that 


PoiC...) £0 & Patek 


9. C has property S;(a) if, for every selection k for which P{(C.,) < a 
for all w or else P{?(C.,.) = @ for all w, equality holds for all w. 

10. C has property S,(a) if it has property S;(a@) and if, for every pair of 
selections k, , kz for which 


PY°(C..) s PY? (C..) 


for all w, equality holds for all w. 

The properties have evident interrelations of which the most important are 

strong advance probability a = advance probability a = c(a). 

Si(a) = S;3(a) => Se(a) => Si(a) = So(a@). 

If C has property c(a), then 

strong advance probability a = S,(a) = --- => S,(a@) = advance probabil- 

ity a. 

The ordinary term confidence coefficient a usually means exact confidence a, 
or sometimes lower confidence a. Tukey [9] introduced the sequence frequency 
(equivalent to exact confidence), advance probability, and strong advance 
probability to describe successively stronger properties of a confidence pro- 
cedure in retaining “level a’’ under selections. The properties S,(a) and S:(a) 
for pure selections have been defined and studied by Buehler [1]. He names the 
selections violating the defining condition rather than the property. The princi- 
pal reason for introducing the sequence {S,(a@)} is to permit differentiation of 
behavior of common confidence procedures. The unsymmetric property So(a) 
seems of interest in much the same “conservative” way that lower confidence 
c(a) is of interest. 

Buehler’s examples, combined with the examples and results of this paper, 
seem to indicate the need for properties intermediate to advance probability 
and strong advance probability, and even suggest that strong advance nroba- 
bility may be so strong and rare as to be of little value. 


3. Principal results. 


THEOREM 1: Let C be a confidence procedure which is level a Bayes against = 
with respect to the specification p for some ¢. Then C is S:(a). If in addition, 
£ is positive on 2, C is S3(a). 

Coro.uary 1: A confidence procedure C which has lower (or upper) confidence 
a, but not exact confidence a, is never level a Bayes against any = positive on Q. 

THEOREM 2: Let C be a confidence procedure which is level a weak Bayes against 
¢ with respect to the specification p for some £. Then C is S,(a). 

Coro.uary 2: If C has exact confidence a and is level a weak Bayes against {, 
then C has advance probability a. 

Coro.iary 3: If a fiducial distribution for w for the sample point z has density 
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f(+ | z) which is a weak posterior density g;(+ | z) with respect to some admissible 
prior quasi-density for all z, then any confidence procedure giving fiducial proba- 
bility a for every z has the property S:(a). 

This specifically includes results of “integrating out’’ nuisance parameters. 
Such procedures will not, in general, have any of the confidence level properties: 
c(a), ¢(a) or &(a). The result is not at all dependent on the problems of con- 
struction or meaning of fiducial distributions and fiducial probability. 

The results in examples (a), (b) of Section four could have been obtained 
using results of Fisher and of Jeffreys ((5], [6]) together with Corollary 3, but 
it seems preferable to derive directly the facts necessary to apply Theorem 2. 

Confidence procedures for functions of w with nuisance parameters are easily 
handled directly by Theorems 1 and 2 and Corollary 2. For a more explicit treat- 
ment in an important special case, suppose w = (0, ¢) with = @ X A 
confidence procedure C will be called a confidence procedure for 6 if C is a cylinder 
set with base C* in Z X ©. Assume that the measure A on Q is a product measure 
1 X Xe of measures on @ and &. 

Coro.tuary 4: If a confidence procedure C for 6 with base C* in Z has the prop- 
erty that 


Cr = (6: (z, 0) €C% 
has, for each z, probability a under the marginal distribution on © of a weak pos- 
terior density, then C is S;(a). 
TxHeoreM 3: Let C be a confidence procedure which is lower level a weak Bayes 


against £ with respect to the specification p for some ¢. Then C is So(a). 
Proofs are given in section six. 


4. Examples. In examples (a), (b) and (c), Z is a Euclidean space with yu 
Lebesgue measure. In examples (d) and (e), Z is the nonnegative integers with 
counting measure. In all examples, 2 is a Euclidean space (or obvious subspace) 
with \ Lebesgue measure. 

(a) Normal. Let Z be n-dimensional Euclidean space with coordinates inde- 
pendently and identically distributed as N(0, 0”). Let w = (0,0), 2 = >> 2/n, 
S= > (a — 2)*. 

(i) o known. With admissible prior quasi-density ¢(@) = 1, the weak posterior 
density for @ is 


_ n(0—28)2 
vn_ e 2, 
oV/2r 


Hence any confidence procedure with confidence sets of the form 








g:(0|z) = 


{0:0 — Ze Aj} 


with A; a set on the real line with probability « under the distribution N (0, o*/n) 
will have exact confidence a, be weak Bayes and S,(a@) and have advance proba- 
bility a. 
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Not all such procedures are S:(a). For let the set A; be any half-infinite 
interval, say (— ©, a). Let k be a pure selection retaining the point z if 2 < b. 
The conditional confidence level 


Piiz:0-Z<ali<b <a 


for all @ and b. The complementary selection gives conditional confidence 
greater than a for all 6. S,(a) guarantees that the conditional confidence is not 
uniformly below (or above) and bounded away from a. I do not know if confi- 
dence procedures with A, a finite interval must have the property S.(a). 

(ii) ¢ unknown, n = 2. With admissible prior quasi-density ¢(0, ¢) = 1/e, 
the weak posterior density is 


e —- 1 a ae 
noes) = PL (8) oat, 
oV 24 or (5 *) 20 

2 


This can be best described as follows: (6, o) given z is distributed such that 
S/o’ is distributed as chi-square on n — 1 degrees of freedom and, conditional 
on o, 6 is 'N(z, o’/n). The marginal distribution of @ given z is such 
that (@ — 2)+/n(n — 1)/S is distributed as Student’s t on n — 1 degrees of 
freedom. 

Any confidence procedure for 6 with confidence sets of the form 


{0: (0 — 2)V/n(n — 1)/S € Ad} 


with A, a set on the real line with probability a under the t,_, distribution will 
have exact ‘confidence a, and, using Corollary 4, will be weak Bayes level a with 
respect to ¢. The procedure is then S,(a) and has advance probability a. 
Buehler [1] has noted that no such procedure is S;(@), a pure selection accord- 
ing as S = ¢ giving conditional confidence uniformly less than a. 
Any confidence procedure for o with confidence sets of the form 


{o: S/o’ € As} 


with A; a sét on the positive real line with probability a under the x_, distribu- 
tion, will have exact confidence a, be weak Bayes, and hence S,(a) and have 
advance probability a. 

These results are all special cases of example (c) on location and scale param- 
eters. 

(b) The Behrens-Fisher problem. Let Z be n; + m2 dimensional Euclidean 
space, with coordinates independently distributed, the first n, identically as 
N(6,, 03), thé last nz identically as N(@:, 02). Let w = (0;, 01, 62, 02) and let 
2, , 22, Si, Ss be the means and sums of squares of deviations of the two sets of 
coordinates. Assume n; = 2, nme 2 2. 

With the admissible prior quasi-density {(w) = 1/o.02, the weak posterior 
distributions of (6, o;) and (2, o2) are independent and as obtained in example 
(a,;) with appropriate (n;, 2;, S;). 
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The usual confidence procedures for o2/0;, with confidence sets of the form 


z Si a (n — 1)o3 \ 
{ 02/01: (ny; ee lo? oe € Aj f 

with A; having probability a under the F,,,-1,.,-1 distribution will have exact 
confidence a, be weak Bayes and S;(a) and have advance probability a. 


The marginal weak posterior distribution for 6; — 6. is easily found to be 
such that 





(0: — 02) — (& — 2) 
Va + az 


is distributed as the linear combination of independent Student’s variates: 
tn,-1 Sin @ — t,,-1 cos 6, where a; = S,/{ni(n; — 1)] and @ = tan” “[a;/a]*. This 
distribution can usefully be called the Behrens-Fisher distribution with param- 
eters nm, — 1, nm, — 1 and @ — written BF(m — 1, nm. — 1; 6). (The usefulness 
of this terminology is illustrated in the paper [10] in which a more detailed re- 
lated treatment of the Behrens-Fisher problem is given.) 

Any confidence procedure for 6, — 62 with confidence sets of the form 


Or — 62) — (& — 2) 
{= Va + a e As} 


with A, a set having probability a under the distribution BF(m, — 1, nz — 1, 6) 
will be weak Bayes level a; and have the property S,(a). Since the marginal 
weak posterior density for @, — 6. is exactly the fiducial density for 6, — 6 
under the Behrens-Fisher solution, and fiducial procedure with fiducial prob- 
ability a has the property S,(a). Such procedures are known not to have exact 
confidence a, but at least for mn; and nz sufficiently large to have lower con- 
fidence a and not be S:(a). The behavior for small n; and n; is unclear. (C-f. [10].) 

The Welch asymptotic procedure with asymptotically exact confidence a does 
not possess property S,(a). Fisher’s criticism ([3]) of this procedure amounts 
effectively to showing that, for n, = nz = 7, a pure selection with retention if 
| (S:/S2) — 1| < 6 for 6 small gives conditional confidence uniformly below 
and bounded away from a. For n and m sufficiently large, asymptotic theory 
suffices to show that a selection with retention if 


| [Sine(ne on 1)?/Seny(m — 1)’} —1 | <6 
for 6 small has a similar effect. Calculations for small and moderate values of 
m and nz indicate that the effect holds fairly generally. 
(c) Location and scale parameter families. Let Z be n-dimensional Euclidean 


space, let w = (6, ¢) with 6 a (real) location parameter, o a (positive) scale 
parameter for the family p of distributions. Let « = (1, --- , 1). Then 





for a fixed density q. 
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(i) ¢ = 1, known. The prior quasi-density ¢,(@) = 1 is admissible 
(J q(z — @e) d@ < & except on sets of measure zero) for all g, and the weak 
posterior density of 6 is: 


q(z — be) 


r q(z — ye) dy 


Let C be a confidence procedure which is level a weak Bayes with respect to ¢; 
and which also possesses the translation property: (z, 6) ¢C if and only if, 
(z + ae, 6 + a) eC for all a, or equivalently, if and only if z — 0eeC.. The 
translation property guarantees that the confidence level is constant; for 


[ a(z — 06) du(z) = faz) dul) 


and since C is S,(a), then it must have exact confidence a, and have advance 
probability a. Such a procedure is a conditional procedure with translation 
property as constructed by Pitman [7], by choosing a set with conditional prob- 
ability a under the conditional distribution of z — 6 on each configural line 
determined by the differences {z; — 2, 7% = 2, ---, nm} of the sample point. 
Buehler [1] proved the S,(a) result when n = 1. 

(ii) @ = 0, known. The prior quasi-density {2(¢) = 1/¢ is admissible for all 
q, and the weak posterior density of ¢ is 

—(n+1) 
tree |2) = ae — 

! r" o(2/r) dr 


9: (8 | z) + 


Again, a confidence procedure which is level a weak Bayes with respect to ¢: 
and has the natural property under scale change, has exact confidence a, is 
S:(a) and has advance probability a. It is a Pitman scale procedure, with con- 
ditional confidence a on each configural ray from the origin. 

(iii) For n 2 2, the prior quasi-density ¢;(6, ¢) = 1/¢ is admissible for all ¢ 
and the weak posterior density of (@, c) is 


—(n+1) ( _ *) 
o q ores 
gr, (8, o | z) = - areas =a al 
fiw [alma] 
7 T 


Let C be a confidence procedure which is level a weak Bayes with respect to ¢; 
and which has the translation scale change property that (z, (0, 0)) ¢C if and 
only if (2 — 6€)/o ¢ C.@,) . Any such procedure will be S,(a), have exact con- 
fidence a and advance probability a. Included are confidence procedures for 6 
and ¢ jointly and for @ or o separately. For the latter, it suffices to determine 
each C,. according to the marginal posterior densities. Again, the procedures 
are just those of Pitman. 

In all examples, the prior quasi-densities were those of the Haar measure on 
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the appropriate group of translation and/or scale changes. If Q is any c-compact 
group of transformations on Z, and if {(w) dA(w) is Haar measure on Q, then a 
confidence procedure C can be obtained which is weak Bayes level a and which 
satisfies the standard invariance condition under the group &. The latter condi- 
tion insures that the procedure has exact confidence not depending on w and, 
from the former, the procedure is S,(a), hence has exact confidence a and ad- 
vance probability a. To prove that the Haar measure is admissible and to show 
that the weak posterior density is for each fixed w the conditional density on 
each orbit in Z under the group Q, seems to require slightly more structure. 

(d) Binomial. Let p, be the binomial (n, w) density. The usual one-sided 
confidence interval (1;(z),.1) for w with lower confidence a is obtained for z > 0 
as the root of the equation 


z—1 
d p(i|h) = 
t=0 

or, equivalently, of the incomplete beta equation 


en 6 w)”* dw = a. 


1 1 

B(z,n — 2 + 1) J Cs 

For z = 0,14,(0) = 0. The weak posterior density with respect to the prior quasi- 
density §:(w) = w (ie., a beta quasi-distribution with parameters (0, 1)) is 


w* *( 1 se —' 


gr, (w | z)= Ba,n—2+1) =341) 2 
Since ¢; is admissible except for z = 0, for which Cy. = ©, the confidence pro- 
cedure is lower level a weak Bayes and is So(a). Selection by the set {z = 0} 


shows that the procedure is not S;(a). 

Similarly, the usual confidence interval (0, 2(z)) with lower confidence a is 
lower level a weak Bayes with respect to the prior quasi-density (:(w) = 
(1 — w), and is So(a) but not S,(a). 

The usual two-sided confidence interval (1,(z), k(z)) combining the two 
one-sided lower confidence a procedures is a much more complex procedure 
having lower confidence depending on n and a, but lying between 2a — 1 and 
a. The nominal lower level 2a — 1 is achieved only for rare combinations of n 
and a, so that, a fortiori, the two-sided procedure is not usually S,(2a — 1) 

(e) Poisson: Let p, be the Poisson density with mean w. As with the binomial, 
the usual one-sided procedure for w with intervals of the form (1,(z), ~) and 
with lower confidence a is lower level a weak Bayes against the prior quasi- 
density {:(w) = w ', and the procedure is Sp(«) Since 1,(0) = 0, it is not S;(a). 
However, the other one-sided procedure does not suffer from the end effect and 
the interval (0, (z)) determined for all z from the equation 


> plilh) =a 
z+1 
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or from the equation 


1 we 
reen f we dw=a 


gives a confidence procedure with lower confidence a which is level a weak Bayes 
against the prior quasi-sensity {2(w) = 1 and hence has the property S,(a). 


5. A property of prior quasi-densities and weak posterior densities. 

Tuerorem 3: If ¢ is a prior quasi-density with f {(w) d\ = ©, with correspond- 
ing weak posterior density g;(+ | z), then there exists a sequence of prior densities 
{£,} with corresponding posterior densities g,( +|z) such that for all w¢Q and all 
z for which g; is defined, 


lim Gn(w|z) = gr(w|z). 
Further, if {&} is any sequence of prior densities with corresponding prior densities 


{gn(* | z)} such that there exist constants K and {a, ;n = 1, 2, ---} such that for 
all w, 


(5.1) lim anga(w) = $(w) 


Antn(w) Ss K¢(w) 


lim ga(w | 2) = gr(w| 2). 

In the second part of the theorem, condition (5.2) is necessary in that se- 
quences {£,} can be found that satisfy condition (5.1) but for which {g,(w | z)} 
does not converge, or which converges but not to a probability density. 

The second part will be proved first. For all z for which the right-hand de- 
nominator is finite and positive, 


Onp(2 | w)En(w) 


gn(w|z) = , 
aw | pe | w)Ea(u) ax(u) 
Under conditions (5.1) and (5.2), both numerator and denominator converge 
respectively to the numerator and denominator of g;(w|z) for all w and for 
every z for which g; is defined. 

The first part of the theorem will be proved by exhibiting a sequence {£,} 
satisfying conditions (5.1) and (5.2). Since d is o-finite, there exists an increas- 
ing sequence of sets in 2:{B, ;n = 1, ---} such that lim B, = QandX(B,) < ~. 
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Define 
__ min fn, 40) 
En(w) = [ min [n, ¢(u)] dA(u) 
| 0 we B, 


£, clearly satisfies the two conditions with K = 1, 
a, = / min [n, ¢(u)] dA(u). 
Bn 


6. Proofs of results of section three. 
Lemma 1: If a confidence procedure C is B(a, §, p) then for every selection k, C 
is B(a, & , p"), with prior density 


&(w) - E(k) 


&(w) = ° 
[ (u)-BAh) anu) 


Lemma 2: If a confidence procedure C is B*(a, ¢, p) (or B**(a, £, p)), then 
for every selection k, C is B*(a, S, p”) (or B**(a, t, p”)) with prior quasi- 
density 


Sx(w) = $(w)- E(k). 


By assumption in Lemma 1, 


[ E(w) p(z | w) drA(w) 





= da. 


[ se)p(e | w) au) 
But 
t(w)p'(z|w) = a(z)E(w)p(z | w) 


so that 


[orp |») alo) 


— —— =a. 


[ &(u)p™ (z | u) dd(u) 





The proof of Lemma 2 is the same with the exceptional set for admissibility 
unchanged for the selected specfication. 

Let xc denote the set characteristic function of the set C in Z X . By the 
usual conditional expectation interchange of order of integration, for any prior 
density = and specification p, 
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[ P.AC,0)ew) dd(w) fi xe(z,w)p(z | w)E(w) d[(u & A)(z, w)] 


[ h:(z) | f xo(z, w)ge(w | z) ane) dy(z) 
Zz 2 


a [mol f axl | 2) anu) | due), 


Note that the lack of definition of g; for those z for which h:(z) = 0 is of no 
consequence. 

If C is B(a, £, p), then for any selection k, C is B(a, & , p“’) by Lemma 1, 
and applying equation (6.1) to & and specification p“” yields 


[ PP C.delw) du(z) = a. 
2 


It follows immediately that P{(C..) < a(>a) forall w is impossible so C is 
S.(a). If £ is positive on Q, so is & , and C is 8;(@) and Theorem 1 is proved. 

If C is B*(a, ¢, p), then for any selection k, C is B*(a, ¢ , p™) by Lemma 2. 
Let {&,} be the sequence of prior densities guaranteed by Theorem 4 and let 
{g,(*|z)} be the corresponding posterior densities under p“’, converging to 
gi. (* | 2). Since these are probability densities on ©, it follows from Scheffé’s 
theorem [8] that 


(6.2) lim [ gn(w | z) ddX(w) -/ gr,(w | z) ddA(w) 


uniformly in z with the right hand side identically equal to a by hypothesis. 
Then with 


h,(2) = [in(o)p® ( | z) dX(w), 


lim [ hale) lf gn(w | z) ane) | dyu(z) = a, 


and this, together with equation (6.1) applied to & and p“’, yields 
lim [ P2CudeCw) dd(w) =a. 


Hence, for no k is it possible that 
sup P®?(C..) <a 


inf P“’(C..) > a 


so that C is S,(a) and Theorem 2 is proved. 
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If C is B**(a, ¢, p) with ¢ admissible except for the set A C Z then, using 
the same notation as in the preceding proof, equation (6.2) holds uniformly in 
z for z g A. The right hand side is now not less than a. Then 


| rale) lf ga(w | 2) ane) | dul2) — a 


-G- a) | male) du(z) + hg ha(2) lf ga(wo | 2) dd(w) -a| du(z) 


so that 


n~n 


lime int | ha(2) lf galer'| 2) ane) | du(2) 2 « 


and hence 


lim inf | P{°(C..)én(w) dA(w) = a. 
2 


nro 


Then for no k is it possible that 
sup P(C.,) <a 


and C is So(a) and Theorem 3 is proved. 

Corollary 1 follows immediately from Theorem 1, and Corollaries 2 and 3 
from Theorem 2. Corollary 4 follows from Theorem 2, by noting that C,. is a 
cylinder set with base C7. in Z X @. 
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AN EXAMPLE OF WIDE DISCREPANCY BETWEEN FIDUCIAL AND 
CONFIDENCE INTERVALS 


By CHARLES STEIN 
Stanford University' 


1. Introduction. Fisher [1], [2] has emphasized that when he chooses a set in 
the parameter space on the basis of certain observations and attributes to it a 
certain fiducial probability a, he does not intend that, for fixed values of the 
parameter the probability that this random set contains the parameter point 
should be a. Examples of this distinction for the Behrens-Fisher problem have 
been given by Fisher [1], [2] and Neyman [3], [4]. In these cases the numerical 
differences are not extremely large. In order to bring out more clearly the con- 
trast between fiducial probability and confidence sets I shall give, for each 
a and ¢ in the interval (0, 1), an example where a fiducial interval for a parameter 
with fiducial probability equal to a has probability less than « of covering the 
true parameter for a large range of parameter values. This means that although 
a large fiducial probability is claimed, it is practically certain that the interval 
will not cover the true parameter value. Of course this cannot happen when the 
fiducial sets are obtained by Pitman’s methods [6], [7]. 


2. The example. Let X,, --- , X, be independently normally distributed real 
random variables with unknown means £ , --- , & and variance 1. Suppose we 
are interested in fiducial or confidence sets for >> & of the form 


(f(Xi, eee Xn); 20 }. 


We consider the one-sided case only in order to avoid irrelevant computational 
details. The fiducial distribution of & , --- , & is that they are independently 
normally distributed with means X,,--- , X, and variance 1 (see Fisher [1], 
p. 132, where the case n = 2 is given, but see also Tukey [5] for a different fiducial 
distribution). Thus the fiducial distribution of 7 t? is a non-central x’ dis- 
tribution with n degrees of freedom and non-centrality parameter >! X7. On 
the basis of this we determine a fiducial interval 


(1) [Pan( 20 Xi), @) 


with fiducial probability « for the unknown parameter >> £ . Here ®.,.( >, X?2) 

is the value which will be exceeded with probability a by a non-central x’ variate 

with n degrees of freedom and non-centrality parameter >> X{. But this non- 

central x’ distribution is, for large n, approximately a normal distribution with 

mean n + >. X% and variance 2n + 4). X%, the approximation being uniform 
Received January 12, 1959. 
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in Zz X;. Thus for sufficiently large n 


(2) $..(>, Xi) >n+ DXi — t V2n+4>. Xi, 


for all values of > X3 where 4, is a fixed number independent of n satisfying 


(3) ta > ta 

where 

(4) l-a= — ‘ ee du. 
V 2x Sta 


Thus for fixed & , --- , & the probability that the fiducial interval will cover 
the true value of > fi is 


Pe,.--t {20 FS San(D, Xi) } 
(5) S Pye f{ De 2nt+ > Xi — t 2n+4)>, Xi} 
= Py. {> Xis DLA —nt+ t Vn+4>, x3} 
for sufficiently large n. Now let n — ~ with 
(6) lim > B= 0. 
noo WW 1 


From Chebyshev’s inequality it follows that, for any « > 0, we have, for suffi- 
ciently large n, 


(7) P\2n + 4 ae Xi> a’! <«. 
Thus 
Py. {UXis DR—n+t VIn+4>, x3} 
(8) f : 2) 
S Py {UD Xis LH +e 


Again applying Chebyshev’s inequality, or the limiting distribution of >> X?. 
it follows from (6) and (8) that 


(9) lim Py,..-8, {0 i = Sen (LU Xi} = 0. 


n~o 


Let us compare these results with the natural confidence sets. Since >> X? 
has a non-central x’ distribution with n degrees of freedom and non-centrality 
parameter >. £, the confidence sets of the desired form are 


(10) [Pi~..n (2) Xi), &), 
or, approximately for large n, 


(11) SXis VR+n+t.Vnm+4d> 2 
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(which must be inverted to obtain an explicit lower confidence bound for >~ £2) 
as compared with the fiducial interval 


[ban (> Xi), ~) 


which is approximately, for large n 


(13) iS os + a Ve 4d Si 


However a different argument leads to a more reasonable fiducial distribution. 
Because of the rotational symmetry of the problem, it seems reasonable to base 
our procedure only on Z = >. Xi, ignoring the individual observations. Then 
the fiducial argument leads to the intervals based on (10), i.e. confidence intervals. 

At first I intended to write this paper without extended comments, letting the 
example speak for itself. However some remarks of the editor and referees and 
the fact that I have since read the discussion of fiducial inference in Chapter 6 
of Quenouille [8] lead me to believe that some discussion may be useful. Two 
questions may be asked in connection with the above example. Is the argument 
used a fiducial argument as this is understood by the advocates of fiducial infer- 
ence, and is the resulting fiducial distribution of De absurd? The fiducial argu- 
ment has two parts. First the joint fiducial distribution of & ... & is given on 
the authority of [1]. Then the distribution of }>¢? is calculated from this joint 
distribution, the joint fiducial distribution being treated as an ordinary probabil- 
ity distribution. The first step seems to be in agreement with the practice of the 
advocates of fiducial inference. For example, Quenouille on p. 139 of [8] argues 
against the different fiducial distribution given by Tukey [5]. Anyone who argues 
that the second step is not justified seems to be saying that fiducial distributions 
cannot be treated as ordinary probability distributions. In [8] on pp. 114-119, 
Quenouille imposes restrictions on the way some fiducial distributions can be 
used, but (at least to me) it is not clear whether these restrictions are meant to 
apply to cases as simple as the one discussed in this paper, nor is it clear whether 
my derivation meets his requirements if they are applicable to the present case. 

Finally it may be contended that the fiducial interval (1) is the correct one 
and should be used. Because of the conflict with the argument immediately 
below (13), I do not think many people will take this attitude. Apart from this, 
there is an important question of principle here. If n is large and )-£? is small 
compared with n? (which is commonly the case if the é; are coordinates of a high 
order interaction), then the probability that the fiducial interval (1) will cover 
the true value of >°£? has been shown to be small if a is moderate. This has the 
practical interpretation that, when the fiducial interval (1) is applied in such 
situations, it will not cover the true value in the vast majority of cases that 
actually arise. For this reason I cannot understand the contenticn that the 
probability of covering a fixed parameter point is irrelevant to inferences of this 
type. 
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OPTIMUM INVARIANT TESTS! 


By E. L. LEHMANN 
University of California, Berkeley 
Summary. The standard (likelihood ratio) test of the general linear hypothesis 
has been shown to possess numerous different optimum properties. A brief 
survey of these was included in a recent paper by Kiefer [2]. In the present 
note it is shown that all of these, and in fact a wide class of optimum properties 


of which the above are special cases, are consequences of the fact that the test 
is uniformly most powerful invariant. 


1. Order relations among tests. Let X be a random variable with possible 
distributions ® = {P,, 6 ¢ Q} and consider the hypothesis H: 6 ¢ w where w isa 
subset of 2. Suppose that the problem of testing H against the alternatives 
K:6¢Q — w remains invariant under a group G of transformations of the sample 
space. Let 3 be a class of tests ¢ of H, for example the class of all level a tests or 
of all unbiased level a tests, which is invariant under G in the sense that ¢ ¢ 3 
implies gg ¢ 5 for all g ¢ G. Here gg denotes the critical function defined by 


gg(x) = ¢(gz). 


Suppose that a relation < has been defined among the tests of 3 such that 
every pair ¢, ¢’ ¢ 5 satisfies either ¢ < ¢’ or gy’ < ¢. When both of these relations 
hold, we write ¢ ~ ¢’. Let the (weak) ordering < satisfy the following conditions: 

(i) If ¢’ is uniformly at least as powerful as ¢, then ¢ < ¢’. 

(ii) If g, , y e T is any family of tests belonging to 3 and v any probability 
measure over the label space T’, then ¢ < ¢, for all y ¢ T implies ¢ < fe, dv(y). 

(iii) If go < ¢n for n = 1, 2, --- and if ¢ is a critical function such that 
the power-functions 8,,(@) — 8,(@) for all @e Qasn— ~, then gw X< ¢. 

(iv) If ¢ < ¢’ then gg X< ¢’9g for all g ¢ G. 

A test go € 5 will be called optimum within 5 according to this ordering if ¢ < ¢o 
for all ¢ € 3. 

The following are some examples of such orderings, which have been con- 
sidered in the literature. Throughout, 8, denotes the power function of ¢. 

Example 1. Let a(@) = 0 and b(@) be functions which are invariant under the 
transformations G induced by G in the parameter space, and let ¢ < ¢’ if 


inf [a(0)6,(0) + 6(6)] S$ inf [a(@)By (6) + 6(6)). 


Then conditions (i) to (iv) are clearly satisfied. A particular case is obtained 
by putting b(@) = 0;a(@) = 1 if 6 € w’ and a(@) = O otherwise, where w’ is 
Received September 15, 1958. 
1 This paper was prepared with the partial support of the Office of Naval Research 
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an invariant subset of Q — w. Theng < ¢’ if 


inf 8,(@) < inf By-(8). 


A test is optimum according to this ordering if it maximizes the minimum power 
over w’. 

Example 2. Let the tests be ordered according to —s(¢) where s(¢) is the 
stringency of ¢ defined by 

s(¢g) = sup [8*(@) — 8,(6)] 

with 8* denoting the envelope power function. Then ¢ < ¢’ if s(¢) = s(¢’) and 
the four conditions are again easily verified. 

Example 3. Let w’ be an invariant subset of Q — w and suppose that there 
exists a probability distribution \ over w’ which is invariant under the group G 
induced by G in the parameters face. Then the relation ¢ < ¢’ if 


| e(oranco) < f, Be(0) arco) 


also satisfies conditions (i) to (iv). 

Example 4. Suppose that 6 = (@,,---, 6,) and that w consists of the single 
point 6° = (6;,---, 6°). We shall assume that the power function 8,(6) of 
any test ¢ possesses continuous second derivatives 6°8/06,0; for all i and j at 
¢°. Let 3 be the class of all level a tests that are strictly unbiased in the neighbor- 
hood of @ and let A(y) denote the Gaussian curvature of the power surface 
at 6°, which is given by the determinant of the positive definite matrix 
(0°B/00,00;) \¢ . The relation ¢ x ¢’ if A(y) S A(¢’) clearly satisfies (i) and 
(iii). It follows from a remark of Isaacson [1] that the relation is invariant pro- 
vided the transformations g of the parameter space possess continuous second 
partial derivatives at é°, which (under this restriction) verifies (iv). Condition 
(ii), finally, is easily verified. Optimum tests according to the present ordering 
correspond to the type D tests of Isaacson. 


2. Consequences of the Hunt-Stein theorem. Under the assumptions of the 
preceding section we shall now show that if G@ satisfies the conditions of the 
Hunt-Stein theorem (cf. [3], p. 336) and if there exists test yo which is optimum 
according to the ordering <, then there exists an almost invariant test which is 
optimum. Here we require of 53 that it be closed under convex combinations and 
under weak limits. 

The proof is completely analogous to and essentially follows from that of the 
Hunt-Stein theorem, and can be indicated very briefly. If v, is the sequence of 
almost invariant probability measures over G postulated in the theorem, con- 
sider the sequence of tests 


vn, = | ve0 dv,(g) 
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Let ¥ be the weak limit of a subsequence y,, . Then it is shown in the proof of 
the Hunt-Stein theorem that y is almost invariant, and it remains only to show 
that y is optimum. By conditions (iv) and (ii) it follows for any ¢<¢3 that 
¢ XS Wn for all n. Hence by condition (iii) also ¢ < y for all g €3 as was to be 
proved. 

Under the above assumptions, whenever there exists a UMP almost invariant 
test, this will be optimum with respect to any ordering < satisfying conditions 
(i)-(iv). This explains the great variety of optimum properties possessed by 
certain tests and makes it unnecessary to prove each of them separately. 


3. Applications. Consider a sequence of n independent trials and let X,; = 
or 0 as the ith trial is or is not successful. Let P(X; = 1) = p; and consider 
the hypothesis H: p, = --- = p, = } against the alternatives 


p>} (t=1,---,n). 


The problem is invariant under any permutation of the variables and the sign 
test, which rejects when >, X, > C, is uniformly most powerful almost invariant 
(cf. [3], p. 219). This test therefore maximizes the minimum power over the 
alternatives w’: min p; 2 4 + Aorw: max p; 2 4 + A for any A > Q; it is 
most stringent and of type D. 

As a second application, consider the general univariate linear hypothesis in 
the canonical form according to which the variables X,, --- , X-; Yi, ---, Y.; 
Z,,--:, Zm are independently normally distributed with common variance o° 
and means E(X;) = & , E(Y;) = n;, E(Z.) = 0. The hypothesis to be tested 
is H: & = --- = & = 0. This problem remains invariant under the three groups 


-¥,=V;+e(—2 <c;< ©);X:= Xi; = Ze. 


: Orthogonal transformations of X,,--- , X;; Y; 


: X, = aX;;Y; = a¥;;Z = aZ, (a #0). 


The standard test has the following two basic optimum properties: 

(a) It is uniformly most powerful among level a tests which are almost in- 
variant with respect to G, , Gz, G; . 

(b) It is uniformly most powerful among all unbiased (or similar) level a 
tests which are almost invariant with respect to G2 . 

The first of these is well known; the second is easily shown by a standard 
argument. 

Since the groups G, — G; satisfy the conditions of the Hunt-Stein theorem, it 
follows from (a), for example that the standard test is most stringent and that 
it maximizes the minimum power against the class of alternatives 


w’: » ti/o ZA 


To apply (b), consider fixed values of m , --- , », and o, so that the power be- 
comes a function only of & , --- , &. It then follows that for any m , --- , 7, and 
o the standard test maximizes (among all unbiased level a tests), for example 
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the minimum power over the sets w'(m, «--, m%, 7): >. & 2 Aand the average 
power over the spheres > ti = A. This was first proved by Wald [4]. It follows 
further that the test maximizes the Gaussian curvature of the power surface, 
considered for fixed m , --- , 7, ¢ a8 a function of the ’s, and hence is of Isaac- 
son’s type E. This has been shown previously by Kiefer [2], who deduced it as a 
consequence of the test maximizing the average power over the spheres 


De = A. 
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THE WEIGHTED COMPOUNDING OF TWO INDEPENDENT 
SIGNIFICANCE TESTS 


By M. ZEuEN anp L. 8. Jorn 


National Bureau of Standards 


1. Introduction and outline of the problem. In a recent paper on the analysis 
of incomplete block designs [9], the situation arose where one had two statistically 
independent F statistics for testing the same null hypothesis. A test was pro- 
posed for combining the two tests themselves into a single test which weighted 
one test relative to the other. It is the purpose of this paper to investigate 
numerically the power function of this proposed test as it will shed some light 
as to when an intra-block analysis is worthwhile. 

Other situations where one has more than one independent test for testing 
the same null hypothesis are not uncommon. The tests may have arisen from 
several sets of independent data or from independent tests made on the same 
data. General discussions of combining independent tests can be found in Mostel- 
ler and Bush [4], Birnbaum [1], and E. 8. Pearson [6]. For example a common 
situation in clinical experiments is that one desires to investigate the effects of 
two treatments (say) ¢; and t on 2n + m people. It is known in advance that 
m of these people will be available for receiving only one treatment. The experi- 
ment is run by assigning ¢; to (m + n) subjects and ¢, to the remaining n people. 
At a later time, r new people are available who receive treatment ft. Also of 
the 2n original remaining people, the n people who first received ¢, receive te 
and vice-versa. Thus the data consist of a cross-over design making use of 2n 
people, and also data where a person received only a single treatment. Thus it 
is possible to have two tests of the same null hypothesis that the treatments 
have no effect.’ 

The problem of combining information can be formulated as a problem in 
estimation. Generally for applications, this latter formulation is usually pre- 
ferred as it will lead to confidence statements which are usually preferred to 
tests of a null hypothesis. However it seems interesting from a theoretical point 
of view to explore the consequences of combining the significance tests 
themselves. 

Let there be two independent variance ratio statistics given by 


2 2 . 
F; = 815/85, j=1,2 


oF 


with degrees of freedom v and f;(j = 1, 2) respectively used to test the same 
null hypothesis. The numerator and denominator mean squares will be referred 
to as the “‘treatment’’ and “‘error’? mean squares and are such that f,s2;/o} 


Received February 20, 1958; revised January 8, 1959. 
1 We are indebted to Dr. S. Geisser, National Institute of Mental Health for pointing 
out this example. 
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follows a (central) chi-square distribution and vs;,;/o; follows a non-central 

chi-square distribution with non-central parameter 5; , where 6; is defined by 
E(si;) 6; ; 

1 ‘= ] we = ], 2. 

oy WG cat tsast 


When 6; = 0, the null hypothesis is true and »s/,;/o; will follow a (central) 
chi-square distribution. 


For the purpose of combining the two tests, consider the integral transfor- 
mation 


(2) P; - P\F = F; | Ho}, J ~ i, 2, 





which is the probability of the F-ratio exceeding the calculated F; if the null 
hypothesis is true. Our proposed method for combining the two tests is to use 
the critical region 


(3) w:{PiP? < Ca} 


where C’, is a constant depending on an a level of significance and @ is a weighting 
factor (0 < @ < 1) which weights the second test relative to the first. (We will 
always assume that the first test has power which is equal to or greater than the 
second test.) This test is closely related to the procedure suggested by Good 
[3] for combining independent tests. Note that when @ = 0 this corresponds to 
only using the first test; when @ = 1, then both tests are given equal weight and 
the procedure is equivalent to the well-known method of Fisher [2] for combining 
independent tests of significance. The real problem here is to determine how to 
choose the weighting factor 6. Our procedure for choosing @ is to let 6 = 52/8, 
which in turn can be written as 6 = (¢2/c,)(o;/o2) where the c; are known 
constants. It is remarkable that this choice of a weighting factor results in min- 
imum Type II error over a wide range of the other parameters involved. 


2. Distribution of the combined test. 

Null distribution. It is well known that when the null hypothesis is true, the 
distribution of P ; will be that of a uniform random variable over the unit interval. 
Therefore the Type I error of the combined test is 


(4) P(w| Ho) = P{P, P$ <$ C| Ho} = [| dP; dP», 


eg: . f . ‘ 
where w denotes the critical region {P,P2: S C}. Hence by an elementary in- 
tegration we have 


(C, for @=0, 

is yl/@ 

(5) Plw | He) = so — for 0<6<1, 
te —InC), for @= 1. 


Therefore setting P(w|H,) = a results in critical values of C, for an a@ level of 
significance. Table I gives critical values of C, for @ = 0.1(.1)1.0 and @ = .01, .05. 
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TABLE I* 
Critical values of C. for a = .05, 01 


a= .05 





(c. 

C. — ocr’ 

1—@ 
C.(1 — In C,) 
Non-null distribution. If 
vF; 

(6) j= ji + WF; ’ 
then the non-null distribution of z will have the p.d_f. 


(7) p(a;|8;) = 6°? > 1 7 ae (1—2;)%" (0 <2; <1) 
~B (: +i,2)° 
2 2 
and when 6; = 0, (7) reduces to the beta distribution, 


(8) pz; |0) = — e401 2)" oogsass 
"G9 
2’2 


From the elementary properties of the probability integral transformation 
(cf. Pearson [6]) the p.d.f. of P; when the null hypothesis is not correct is given 
by 


¢ ) — Piel 4) | 
(9) f(P p(x; | 0) \ej=0(P;) 


where x; = g(P;) means the solution of x; for a given value of P; , where z; and 
P; are related by 


1 
(10) P; = / p(x; |0) dz;. 
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Hence substituting (7) and (8) in (9) results in the p.d.f. of the non-null dis- 
tribution of P;, i.e., 


> 2 =B(° +4 f) 27! 
ca 
where z; is related to P; by the incomplete beta function 
 # 
(12) P; = Ts, (5). 


Therefore the power of the combined test for a given level of significance a is 


(13) P(w| Hi) = ff s(P.|6df(P2| a) aPs aP2, 


where the region of integration is w:{P,P2 < C.}. 

The integral given in (13) is difficult to integrate as the p.d.f. f( P; | 6;) is not 
an explicit function of P; . In order to evaluate (13) numerically it is convenient 
to consider the integral transformation 


1 

(14) - = | f(P|8) dP, 7 = 1,2. 
Pj 

Then (13) can be written 


(15) P(w|Hy) = [fan ans 


where w* denotes the region w in terms of 7; and z2. Thus to every point on the 
boundary P,P} = C, in the (P;, P2) space there will correspond a point in the 
(m1 , 72) space and it will be possible to map the region w* entirely, even though 
we do not have an explicit expression in the m , 2: variables for the boundary. 

For this purpose it is convenient to find 7; from the non-central distribution 
of Vj, i.e., 


1 1 
(16) n= | s(P\s)aP =] pels) ar, f=, 


no 


Unfortunately the non-central distribution given by (16) is only tabulated for 
values of x; corresponding to P; = .01 and .05. However it is possible to use 
the Patnaik approximation to the non-central F (or equivalent beta) distribution 
[5] and find approximate values for 7; . (This approximation appears to have a 
maximum error of one unit in the second decimal.) For purposes of tabulation, 
it is more convenient to use the non-central variable A; = 6;/v which is related 
to Tang’s non-central parameter ®, [7], by ® = [vA/(» + 1)}. Then in terms 
of A;, the Patnaik approximation can be written 


‘ a a.  ® 3 
(17) [ px |3)) de = Te; (2, 


Nic 


) 
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_ »(1 + A,)’ 
(1 + 2A;) 


(1 + A;)a; 
1 + 2A; ~*~ A; 2;° 


(18) 
a 


Table II summarizes calculations for the Type II error (P,;) for the pa- 
rameters 


a = .05, 6 = 0(.2)1.0, A, = 1(2)5, Ae = 0(1)4, de < Ay, 
(», fr, fe) = (5, 10, 5), (5, 15, 5), (5, 15, 10), (10, 10, 5), (10, 15, 5), 
(10, 15, 10), (5, 30, 10), (5, 30, 15), (5, 30, 20), (5, 30, 25). 
05, 8 = 0(.2)1.0, A, = 1(2)3, Ae = O(1)2, Ae < Ar, 
v = 10, 15, fi = 30, fe = 10(5)25. 
= 01, 6 = 0(.2)1.0, A; = 1(2)5, de = 0(1)4, Ae < Ay, » = 5, 10 
(fi, fe) = (10, 5), (15, 5), (15, 10), fi = 30, fe = 10(5)25. 
= 01, 6 = 0(.2)1.0, A, = 1(2)5, Ae = 0(1)4, Ao < AL, » = 15 
(fi, fe) = (15, 5), (15, 10), f = 30, fe = 10(5)25. 
= .01, 6 = 0(.2)1.0, A, = 7, As = 0(2)6, » = 5, 10 
(fi ,f2) = (10, 5), (15, 5), (15, 10). 


Since Patnaik’s approximation may be in error by one unit in the second decimal 
place, the accuracy of Table II is limited to at best an error of the same magni- 
tude. Interpolation in the table on any of the degrees of freedom parameters 
should be made using the reciprocals, i.e., »’, f;’. 


3. The effect of the weight factor on the Type II error. A typical Type II error 
curve is graphed in Fig. 1 for the parameters a = .05, v = 5, fi = 10, fe = 5. 
Note that it is possible for the Type II error of the combined test to be larger 
than if a single test had been used alone. This corresponds to the case when the 
second test is given too much weight. 

Note also that the minimum P), is rather flat. For example for A, = 5, A, = 1 
the minimum is between @ = .2 and @ = 4. This is typical of the behavior of 
P, . Table III shows the range of @ for which the minimum P,, (to two decimal 
places) was attained. Also given in this table is the ratio A,/A,; = 42/5, which 
we put forward as the weighting factor. In the entire table of P,,; this choice of 
6 will result in being off by at most one unit in the second decimal from the 
minimum P;; . 
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Fig. 1. Type II error (Pu) for » = 5, fi = 10, fe = 5, a = .05. 
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COMPOUNDING OF SIGNIFICANCE TESTS 


In general the non-centrality parameter can be written as 


(19) _= = j = 1,2, 
Cc: 


where c; is a known constant which depends on how the observations were taken, 
a; is an underlying population variance, and y’ is a non-negative constant which 
depends on the particular hypothesis involved and is only equal to zero if the 
null hypothesis is true. Hence, 


(20) 


which is a function only of the known constants c; and the ratio of the popu- 
lation variances. Of course in many practical situations the ratio of the variances 
i/o; may not be known. In this case we believe that the estimate for o}/o2 can 
be used in the weighting factor. This recommendation is based on the fact that 
the weighting factor need not be known accurately in order to achieve a minimum 
Py, . However it should be pointed out that this latter procedure will result in a 
change in the significance level and power of the test. 
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BAYES ACCEPTANCE SAMPLING PROCEDURES FOR LARGE LOTS! 
By D. Gururiz, Jr. anp M. V. Jouns, Jr. 


Stanford University 


1. Introduction and statement of the main results. A lot consisting of N items 
may be characterized by N non-negative random variables X; ,i = 1,2, --- , N, 
where the value of X; indicates the quality of the ith item. In a typical case X; 
might take on the values zero and one according to whether the ith item is non- 
defective or defective. Alternatively, X; might be defined to be the number of 
defects in the ith item so that the possible values of X; would be 0, 1, 2,---. 
In still another formulation X; might be a continuous random variable related 
to the deviation from standard of some characteristic of the item. We shall 
assume that the random variables X;,i = 1, 2,--- , N, are independent and 
identically distributed with common distribution function F(z |) depending 
on a single parameter \. 

The fixed size sampling inspection scheme to be considered consists of the 
random selection of n items from the lot and the observation of the values of 
the corresponding X;,’s. Thus, the sample may be described by the random 
variables X,, X2,--- , X,. The two possible actions to be taken on the basis 
of the sample are acceptance or rejection of the uninspected remainder of the 
lot. The consequences of these alternative actions are appraised by the following 








cost model where we let S,; = Din X; for any k = 1,2,---,N: 
Action | Cost 
(1.1) Acceptance | a,(Sy — Sa) + aa(N — n) + 3S, + son 
Rejection r1( Sy -~ S,) + r2(N —_ n) + 8S, fe Son. 
Thus, fori = n + 1, n + 2,--- , N, the contributions to the total cost due to 
the acceptance or rejection of the 7th item without inspection are given by 
a,X; + a2 and 7X; + re respectively. For i = 1, 2, --- , n the cost of inspection 


(and possibly replacement) of the ith item is given by s,:X; + s . If, for example, 
an item is classified as defective or non-defective by X; , then Sy and S, are the 
number of defective items in the lot and in the sample respectively. Suppose 
that the cost of accepting an item is a if the item is defective and zero if the 
item is non-defective, and that the cost of rejecting the uninspected remainder 
of the lot is proportional to the number of items remaining in the lot. Then 
a, = r,; = O. If, in addition, all items found to be defective in the sample are 
replaced with good items, each at a cost of s; units, and s: represents the cost 
of the time and labor required to inspect each item in the sample, then (1.1) 
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Action Cost 


(1.1a) Acceptance a(Sy — Sn) + 8S, + sn 
Rejection ro(N — n) + 3S, + sen. 


The cost model (1.1) includes a wide variety of sampling inspection and ac- 
ceptance sampling problems corresponding to various choices of the cost param- 
eters and the family of distribution functions F(z |X). Similar formulations of 
this problem have been given by several authors, notably [1], [2], and [3]. 

The authors of [1] and [2] have attempted to characterize optimal sample 
sizes in terms of a minimax criterion. When the lot size is large this approach 
seems to lead to sample sizes which are appropriate when the true state of nature 
(value of \) has a high a priori probability of being very close to the “indifference 
state” where either acceptance or rejection leads to the same expected cost. 
Such an a priori assumption about the true state of nature will not generally be 
reasonable, which suggests that the minimax criterion is not suitable for this 
problem. 

The purpose of this paper is to find explicit asymptotic characterizations for 
large N of the decision procedures and sample sizes which are optimal in the Bayes 
sense for various classes of a priori probability distributions defined over the 
values of the parameter \. This problem js considered for certain families of 
distribution functions F(x|X) of the exponential type having the property 
that E(X |) = X. This parametrization of distribution functions of the ex- 
ponential type is convenient because (1) for this case the range of possible values 
of X coincides with the range of \, and (2) any available a priori information 
will usually be most easily expressed in terms of the expected quality of an 
item, i.e., the value of the parameter X. 

The two principal reasons for investigating the Bayes solutions to this problem 
are as follows: (1) In most practical situations the statistician will possess some 
subjective a priori information concerning the probable values of the parameter 
\ and such information may often be reasonably summarized and made objec- 
tive by the choice of a suitable a priori distribution; (2) for statistical decision 
problems of the type under consideration, the class of Bayes decision procedures 
coincides with the admissible class so that all procedures discussed will have the 
optimal property of admissibility (see, e.g., [4]). 

Each family of distribution functions F(z | \) to be considered will be defined 
in terms of a given measure » on the Borel sets of the positive real half-line as 
follows: Let 


int {2 [ du> oO, all «> oh : 
(12) b 


cup {2: | du>O0O, all <> 0}. 
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Let I, be the interval [a, b] if b < » and [a, ©) if b = «©. We assume that yu 
satisfies the conditions 
(3 A) I, is non-empty, 
3) 
B) There exists a function w(A) such that for each d « J, 


[ ze? ™* du(zx) 
(aco) os 


-<--_ e E e Y 
[ eo dyu(x) 
[a,e0) 

We now define F(zx|A) by 


0, 234 


[ emt du(t) 
“les) 


o* du(t) 


[a,00) 


(1.4) F(z|ad) = 


L, a> 3, 


The following theorems concerning such families are proved in Section 2. 

THEOREM 2.1: The function w(d) given by (1.3B) is unique and dw(d)/dr 
exists and is positive for Nel, . 

THEOREM 2.2: If F(x|X) is defined by (1.4), then all moments of F(x |X) 
exist and all derivatives of w(d) exist and are finite for Xe I, . 

THEOREM 2.3: The distribution function F(x |X) defined by (1.4) may be cepre- 
sented fora < x S band ford «eI, by 


( » 
(15) F(x|y) = K(y) [exp jw(aye — J rws'(u) an} du(t), 
{a,z) x 
if and only if assumption (1.3B) is satisfied, where y ¢ I, and K(y) is a normalizing 
factor depending on the choice of y and determined so that F(b+ |X) = 1. 
We may unambiguously define F(x | a) and, if b is finite, F(x | b+) by 


(1.6) F(x\a) = lim F(x|\), 
A\a 

and 

(1.7) F(x| b+) = lim F(x}\). 
7b 


It is easily verified that the n-fold convolution of F(z |) may be written 


: \ 
(18) F(2|r) = (K(y))" | exp {w(t —nf ws'(u) dus du (t), 
{na,z) 7 J 


n) 


where u” is the n-fold convolution of u. We define the interval /{"? = [na, nb] 


. ) . r . r r r 
ifb < « and /{” = [na, ~) ifb = «. Nowsince X,, X2,--- , X, are assumed 
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to be independent with common distribution function F(z | \), we see that the 
sum S, is distributed according to (1.8). Furthermore, by applying the fac- 
torization criterion for sufficiency to the joint distribution of X,, X2,---, X, 
(see, e.g., [5]) it is easily seen that S,, is a sufficient statistic for the problem under 
consideration so that we may confine our attention to decision procedures de- 
pending only on the value of S, . 

Some particular examples of families of distribution functions F(z |) which 
are of practical interest are as follows: 

Example 1: If u is the counting measure on the integers zero and one, and 
w(X) = In (4/1 — A) for 0 < A < 1, then F(z |X) is the distribution function 
of a Bernoulli random variable taking on the value one with probability \ and 
zero with probability 1 — X. 

Example 2: If v is the counting measure on the non-negative integers, 
du(x)/dv(x) = 1/z!, and w(A\) = Ind for 0 < A < ~, then F(z |X) is the dis- 
tribution of a Poisson random variable with expected value i. 

Example 3: If v is Lebesgue measure on the positive half-line, du(x)/dv(x) = 
nx” '/T(n) for known 7 > 0, and w(A) = —n/\, then for each », F(x |X) is 
a gamma distribution with E(X |X) = Xand Var (X |) = d’/n. 

In order to discuss the properties of the Bayes sample size it will be necessary 
to consider a further specialization of the class of distribution functions F(z | \). 
To this end we let 


(1.9) ons iy zh 


gtor Mlb 1k 
ka + Br’ 


where k is a positive number and a and 8 are numbers such that either (i) a > 0 
and 6 2 0, or (ii) a > 0 and 8 = —a/b* where b* is a positive integer. Let 
u(x) be a measure such that 


1, (= 0, 


(1.10) dulz) _ aa +)---(a+ (Z 
7 dv(x) eS ci _\k x = k, 2k, - 


(i): 


where vy is counting measure on 0, k, 2k, --- . For case (i) J, = [0, ~) and 
for case (ii) J, = [0, b], where b = kb*. These definitions permit us to define 
the class $, of distribution functions F(z | \) as follows: 


(1.11) 3, = The class distribution functions F(z |) of the form (1.4) for 
which the corresponding w(A) and u(x) are determined by 
(1.9) and (1.10) respectively. 


The class of distribution functions 5, clearly contains the Bernoulli (a = k = 1, 
8 = —1) and Poisson (a = k = 1, 8 = 0) examples discussed earlier That the 
distribution functions in the class %, are well defined follows from 
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THEOREM 2.4: If the function w(d) and the measure u(x) are defined by (1.9) 
and (1.10), then condition (1.3B) is satisfied. 

The form of the n-fold convolution F‘”(x|X) of a distribution function 
F(x|) in the class 5, is needed in the derivation of an asymptotic expansion 
for the Bayes risk. The following theorem gives a formula for F‘” (x | \). 

TueoreM 2.5: If F(x |X) 51, then for all integer values of m, F“(x|) is 
given by 


0, m = 0, 
(a) » ge 
(1.12) F™(km}x) = 4) — ™ (hm) | Go Bp 
‘  adu 
-exp{—n [ ats} dt, m=1,2,---, 
where 
i z= 0, 


(1.13) rz) = cee B)-- (na + €: a ) 8) 


» «=k, 2k,--- 
(;): 
k} 


In addition to the class $; of discrete distributions we will consider the class 
of continuous gamma type families defined by (1.4) with 








(1.14) w(A) = —n/A, 
and 
0, 2 < %, 
du(z) _}) . 14 
dv(x) \% ' rz 
’ T(q) ’ “-* 


where v(x) is Lebesgue measure on the positive half-line. The class $2 is defined by 
(1.16) $2 = the family of distribution functions F(x |) of the form (1.4) 
with w(A) and u(x) given by (1.14) and (1.15). 


This definition leads to 
THEOREM 2.6: For any distribution function F(x |X) € $2, (¢) condition (1.3B) 
is satisfied, and (ti) 





nn pe ’ 
; Fo ad (na) —nn-1 a zn : 
(1.17) (x| ad) Tn) sh u exp . du 
Returning now to the underlying decision problem, for any fixed n let 6(s,) 
be a decision rule which is to be interpreted as the probability of acceptance of 
the uninspected remainder of the lot when s, is the observed value of the sufficient 
statistic S, . Regarding \ as a random variable A by virtue of the assumed 
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existence of an a priori probability distribution we observe that, given the value 
h of A, Sy — S, and S, are conditionally independently distributed according 
to F*~”(x|X) and F(x |X) respectively so that 


E{Sy — Sn| Si} = E{E{Sy — Sa| A, Sn} S,} 
(1.18) = E{(N — n)A|S,} 
= (N — n)E{A| S,}. 
Hence, referring to (1.1) the risk incurred by using the rule 6 may be written as 
R(6,n, N) = E{s(S,)[ai( Sw — Sn) + aa(N — n)]} 
+ E{{l — 6(S,)][n( Sw — S,) + 2(N — n)}} 
+ EjsS, + sgn} 
E{5(Sn)[a — 1)(Sv — Sa) + (@ — m)(N — n)}} 
+ (sn + 71(N — n))E(A) + sen + r(N — n) 
(N — n)E{6(S,)[(a1 — m)E(A| Sa) + a2 — 12} 
+ [sm + n(N — n)JE(A) + sen + re(N — n). 


From this it is clear that the essentially unique Bayes decision rule is given by 


1, E{A|S. =) So, 
(1.20) 5*(s,) = 


\0, otherwise, 


where c = re — @:/a; — ™, provided a < c < b. To avoid trivial cases where 
acceptance or rejection is determined without sampling we shall assume that a 
< c < b, which implies either (1) 7, < a and rz > a@, or (2) m1 > a and re 
< a,. Referring to (1.1) we see that for any given cost situation where (1) 
holds we may find a corresponding second situation where (2) holds which 
becomes identical with the first when the two actions are interchanged. Hence 
in the sequel we shall assume without loss of generality that (2) holds. 

Unfortunately, for many a priori distributions and many families F(z | \) of 
interest the quantity Z{A |S, = s,} cannot be expressed explicitly. The follow- 
ing results which are proved in Section 3 give more explicit characterizations 
of 5* for the case where n is large. These results are also needed for the deter- 
mination of the Bayes sample size. 

Let G(X) be the a priori distribution function of the parameter X, i.e., G(A) = 
P{A < X}. We assume that G(\) assigns probability one to the closed interval 
[a, b]. We assume further that F(A) is finite and that G(\) does not assign prob- 
ability one to any single point. Define the function ¢,(t) for te If” by 


[ » exp {ta(d) —n / wo’ (u) au} dG(r) 
(1.21) e.(t) = S___»—__7?__ 


: ; 
I exp {to(X) - n | uw’ (u) an} dG(r) 
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It is easily verified that ¢,(t) coincides with E{A| S, = t} for almost all values 
of t which are possible values of S, . From this definition we obtain 


THEOREM 3.1: The function ¢,(t) given by (1.21) is finite and strictly increasing 
for te TS”. 


We now observe that exactly one of the following must hold: 

i) gn{nb) < ¢, 

(1.22) li) ga(na) > c, 
iii) ¢,(t,) = c for a unique ¢, ¢ 7S”. 

Hence the Bayes decision rule 6*(S,) given by (1.20) is equivalently expressed by 
(1, if S, S Un), 
(1.23) a*(S,) = | ; 
| . otherwise, 


where 
(na — 1, ¢n(nb) <¢, 
(1.24) t(n) = <nb, ¢n(na) > ¢, 
lt. , go(t.) = ¢, te T,”. 


An asymptotic characterization of the function ¢(n) for a priori distributions 
G(A) placing positive weight on both sides of c is given by 

THEOREM 3.2. If Xo = sup {A:A S c; G(A+-) — G(A — €) > Oall « > 0} and 
M = inf {A:A = c; G(A + €) — G(A) > O, all € > 0}, then 


(1.25) ho S lim inf t(n) <S lim sup a SA. 


n~-2 7 n-?oe 


Thus for the particular case where G(\) assigns positive weight to every interval 
about c we have Ay = A, and 


(1.26) t(n) = en + o(n). 


In order to obtain a more precise asymptotic characterization of t(n) we 
define two classes of a priori distributions as follows: 


(1.27) G: = the class of all G(A) which are twice continuously differentiable 
in some open interval about c with G’(c) > 0; 


(1.28) Ge = the class of all G(A) for which there exist numbers /¢ and ug , 
lg < ¢ < ue, which are assigned positive weight by G(A) and 


are such that G(u¢@) — G(c+) = Oand G(c) — G(le+) = 0. 


The class of a priori distributions assigning probability one to a finite set of 
points is of course a subset of G.. We now have 
THEOREM 3.3: If G(A) & Gi, then 


(1.29) t(n) = en + wo" (¢) G"(c) + o(1). 





(we)?  G’(cjw'(e) 
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THeEoreM 3.4: If GA) € S2, f: = G(le+) — Gla), and f2 = G(uet+) — Glue), 
then 


sha ; uw'(u) dus In eae ae ia 

€ 2\4¢¢@ 

te): A 8 ee ee. ae ee 

Although the classes G, and G2 are not exhaustive we have characterized t(n) and 
hence 6*(s,) sufficiently for most practical purposes as long as n is reasonably 
large. We now turn our attention to the problem of determining the Bayes 
sample size n* = n*(N) which minimizes the risk R(é*, n, N), and seek an 
asymptotic characterization of the Bayes sample size n*(N) for large N. 

The parameter k appearing in (1.9) is essentially a scale parameter in the 
sense that if X is a random variable distributed according to (1.9) with k = 1 
and if \ is replaced by A*/k then kX has a distribution of the same form as (1.9) 
with arbitrary k and with A* playing the role of \. A similar remark applys to 
the parameter 7 appearing in (1.15). Hence the cases where k and » are arbitrary 
may be obtained from the cases where k = 1 and 7 = 1 by multiplying the 
appropriate cost coefficients by k or 7 and making suitable changes of variables 
in the a priori distribution functions. For the sake of simplicity the remaining 
results are stated for the cases k = 1 and n = 1 only. 

The asymptotic behavior of n*(N) is characterized by the following theorems, 
which are proved in Section 4. 

TueoreM 4.1: If F(x| dX) ¢ 5, (with k = 1) and G(A) €G. , then the Bayes risk 
for fixed n and N is given by 


R(6*,n,N) = n((s, — 1) E(A) + (8 — 1m) + (n — a) f — c) dG(d)) 
+ Nn E(A) + 2+ (a =n) [ = 0) dGQ)) 
+ (N —n)- eee + (N —n)o (‘) 


2an 


and the Bayes sample size is 


N, Ag 
(1.32) n*(N) = _— & _ 
N! a(i- a) (a + Be)G'(c) sy" + o(N"), 


\ 2aA @ 


where 


(1.33) Ag = (s — m)E(A) + 8 — T2 + (mn — GQ) l (A — c) dG). 


TueoreM 4.2: If F(x|d) eS; (with k = 1), GA) € Ge and Ag ts defined by 
(1.33), then the Bayes sample size is given by 
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N, Ac £0, 
(1.34) n*(N) =4 a le 

Kin N — 57 ininN + O(1), Ae > 0. 


(The definition of X is lengthy and is contained in the proof in Section 4.) 
TueoreM 4.3: If F(x|X) € F. (with » = 1), G(A) € Gi and Ag is defined by 
(1.33), then the Bayes sample size is given by 


N, Ag < 0, 
(1.35) wD ae (Age @ 
2Ae 


THEOREM 4.4: If F(x |X) € SF. (with n = 1), G(A) € Ge and Ag is defined by 
(1.33), then the Bayes sample size is given by 


1/2 
) + o(N'”), Ag > 0. 


N, Ae 0, 


(1.36) n*(N) = K 
KlnN —->lnnN + O(1), Ag > 0. 


In all cases where \¢ S 0 the proper procedure is to screen the lot completely 
(i.e., take n*(N) = N). Theorems 4.1 and 4.3 show that if an a priori proba- 
bility density for the parameter \ exists in the vicinity of the critical point c 
and if this density is smooth and positive at c, then the optimal sample size 
when Ag > 0 is approximately proportional to the square root of the lot size 
N when N is large. For the cases covered by Theorems 4.2 and 4.4 where the 
a priori probability that \ lies within a certain neighborhood of c is zero, the 
optimal sample size when Ag > 0 is approximately proportional to the logarithm 
of N when N is large. It is clear from these results that the optimal rate of in- 
crease for the sample size depends critically on the fine structure of the a priori 
information about \ in the vicinity of c. This is especially remarkable in view 
of the fact that c is actually the “indifference” value of in the sense that if 
\ = c then either acceptance or rejection of the lot leads to the same expected 
cost. 

Referring to (1.3) and (1.32) of Theorem 4.1 we may write 


(1.37) R(é*,n,N) = Acn+ BeN + CaN — n) (: +0 (4), 


n 
and if Ag > 0 


1/2 
(1.38) n*(N) = (<) NY? + o(N*”), 


4. G. 


where Ag is given by (1.33), and Bz and Cg are coefficients depending on the 
costs and the a priori distribution G. It is easily verified that the term B,N repre- 
sents the ‘“subminimal”’ risk which would result if the value \ of A were known 
exactly without sampling and the decision to accept or reject determined ac- 
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cordingly. From (1.37) and (1.38) we obtain 
(1.39) R(5*, n*(N), N) = BoN + 2(AcCe)'?N™? + 0(N*”), 


which shows that the amount by which the Bayes risk exceeds the subminimal 
risk due to the uncertainty concerning the value of A is of smaller order in N 
than the subminimal risk itself. Expression (1.39) is still valid if the sample size 
is determined by taking only the first term in the asymptotic expansion for 
n*(N) so that not much is lost by making this approximation if N is large. 
Similar remarks may be made for the cases covered by Theorems 4.2, 4.3 and 
4.4. For the cases covered by Theorems 4.2 and 4.4 the term added to the sub- 
minimal risk in the expressions for R(é*, n*(N), N) is of order In N. 


2. Theorems concerning the class of distribution functions F(z | \). 

THEOREM 2.1: The function w(X) given by (1.9) is unique and dw(d)/dX exists 
and is positive ford eI, . 

Proor: By assumption (1.3) there exist numbers wo , w: such that the ratio 


i xe” du(x) 
p(w) = —____ 


is finite for w. < w < w,. Furthermore, p(w) is differentiable with respect to w and 


wr 


(2.2) p(w) = [ (x — p(w))® _—-_— dy(z) > 0, 
or [ e** du(zx) 
[a,00) 


for w < w < w. Now ford € J, we have, by (1.3B), p(w(A)) = A which implies 
that w(A) is unique and that dw(A)/dd exists and is given by 


do(k) 1 


dk p’(w(d)) 

THEOREM 2.2: If F(x |X) is defined by (1.4), then all moments of F(x | X) exist, 
and all derivatives of w(X) exist and are finite for I, . 

Proor: The function (A) is continuous and strictly increasing for \ ¢ I, by 
Theorem 2.1. Therefore, the moment generating function given by 


[ eit toO)e du(zx) 
(2.4) andl D> 00 Se cp nctmmnepinn 


[ es dp(x) 
(4,20) 


(2.3) 


exists for each \ ¢ I, for all ¢ in some open interval about zero since both the 
numerator and the denominator of the ratio (1.3B) must be finite for any 
del, . That is, for any fixed \ ¢ J, , we may choose ¢ ~ 0 small enough in magni- 
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tude so that there exists a \* ¢ J, for which |t| + w(A) < w(A*) so that the 
integral in the numerator must converge. 

Repeated formal differentiations of p(w) yield sums of ratios involving products 
of integrals of the form fia.) 2'e“’du(x) in the numerators and powers of 
Ste.x) e* du(x) in the denominators. These integrals are finite and those in the 
denominators do not vanish for w = w(A), Xe J, , so all derivatives of p(w) exist 
for such values of w. As before, for d ¢ J, , 


dio(X) 1 
25 Seige) piensa 
— an ~ pw) 
and repeated application of the rule for differentiation of implicit functions 
shows that w(X) possesses derivatives of all orders for \ ¢ J, . 


THEOREM 2.3: The distribution function F(x |X) defined by (1.4) may be repre- 
sented fora <x S band ford «I, by 


(2.6) F(xz|dX) = K(y) I exp 400) = / uw’ (u) an} du(t), 


if and only if assumption (1.3B) is satisfied, where y ¢ I, and K(y) ts a normalizing 
factor depending on the choice of y determined so that F(b+ |X) = 1. 
Proor: We observe that 


d 


9 Ss. 
(2.7) AX J a,20) 


ems dp(x) = w'(n) | xe? * dy(x) 


[@,20) 


so that dividing both sides by fjo,.) ce’ du(x) and referring to assumption 
(1.3B) we have for X\¢ I, , 


(2.8) @ in [ eo * du(a) = dw’ (A). 
dx {a,20) 
Hence 
( ) 
[ f du(x) = exp J hw’ (A) dX + ¢ 
[a,20) ) 
(2.9) 


( 
K(y) exp | uw’ (u) au} 
Y / 


for y eI, . The fact that assumption (1.3B) is satisfied whenever (2.6) is valid 
follows immediately by differentiating the expression F(b+ |) = 1 with re- 
spect to X. 

We now verify that the classes 5, and 5, defined by (1.11) and (1.16) satisfy 
the above assumptions, and determine closed expressions for the n-fold con- 
volution of distribution functions in these classes. 

THEOREM 2.4: If the function w(X) and the measure w(x) are defined by (1.9) and 
(1.10), then condition (1.3B) is satisfied. 

Proor: Let the function r(x), x = 0, k, 2k, --- be defined by the generating 
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function 

o* at 
9 . we é ; B ~ 0, 
(2.10) 2d, r(ke)t a — pt), sh 


It is easily verified by successive differentiation of (2.10) that r(x) = du(x)/dv(x) 
as given by 1.10. Substituting A/(ka + 6d) for t in (2.10), we obtain 


A/k 


b* r z ae 0, 
2.1 C. - a = a/ 
(2.11) 2d r(ke) (- - a) (: . a) : sik 
vat 


Noting that w(\) = (1/k) In (A/(ka + Bd)) by (1.9) we may write (2.11) as 
ele B=0 


(2.12) [ pu) 1 , = a/p 
' [ 0,20) P du(z) (1 = e) : 
ka 


Differentiating (2.12) with respect to A and dividing both members by 
w'(X) = (a/A(ka + Bd)), we obtain 


6 # 0. 


(rer, 
(2.13 [ pu®) 2 = a/p 
; pee: ae (1+) 
ka , 


The proof is completed by noting that the ratio of (2.13) to (2.12) is always A 
so that (1.3B) is satisfied. 
Theorems 2.3 and 2.4 imply that for any F(z | A) ¢ 5; 


z—l 


(2.14) F(kx|d) = > r(kt) exp {Ha(n) _ [ uw’ (u) du} , 
t=0 0 ) 

for integer values of x, where w(A) is given by (1.9), r(x) = dy(x)/ dv(x) is 

given by (1.10), and the value of y appearing in (2.6) is taken to be zero. The 

fact that we must have K(0) = 1 if F(b+ |X) is to equal one follows from the 


observation that w(A) — —* as AX — O together with the assumption that 
r(0) = 1. 


The following theorem provides an integral formula for the n-fold convolution 
F™ of F(a |) when F is in $, . 
THEOREM 2.5: If F(x |) € 5; , then for integer values of m, F‘” is given by 


0 
{”" -1 
." — mr (im) [ ka + BD* 

‘ adu \ 
‘exp{—n ; ) a + Bu; dt, 


F™ (km|X) = 
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where 


[1 z= 0, 


(236) r%(a) = {naira +8) --- (na + (E- 1) 8) 


Proor: By (1.7) 


{ r 
(2.17) F(A) = | exp o(A)z = n| uw’ (u) in} du” (x) = 1 
(0,20) 0 


where » is the n-fold convolution of u. Hence, letting r(x) = du” (x) /dv(x) 
and t = \/(ka + Bd) and recalling that w(A) = (1/k) In (A/(ka + Bt)) for this 
case, we have 


| 





> ( kat/i—st a du 
r”’ (kx) = exp nf . 
z=0 E \ 0 ka + Bu) 


, B = 0, 
a — pny", é #0. 
Successive differentiation of (2.17) with respect to ¢ yields (2.16). Referring to 
(1.8), we may write 
FP (k > r c f * adu 

2 1¢ (bs = "(bp zs ~ . 

(2.19) (km | d) 2, r”™’ (kx) (- + a) exp { n I ka + Bul’ 

for integer values of m. Assuming that (2.15) holds for some integer m, we have 


F™ (k(m + 1) {) 


m f X 
= F'(km|d) + r(km) (,—*-.) exp) —n om | 


(2.18) 


if 








ka + Bn b ka + Bul 
(2.20) _ wre fi 4 ee { _. fi adu 
= 1 mr (km) fo (ka + Bi” exp n A oe at 





m X 
(n) r i a du _ 
ee ( + my) exp { *. be at 


Integrating the second term on the right by parts yields 


(n) Wiad —m ‘ adu | 
mr” (km) I t Cc + Bt) exp{—n [ko ait | dt 
r m t 
~~ ane (n) t Ae — _adu | 
(2.21) =f (km) (ne + mB) [ (ka + Bt)™*t exp { n lo ka + Buf 


( 


(n) r ‘ = * adu 
+ r” (km) = a) exp { n , ee 
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Substituting this expression in (2.20) and recalling (2.16), we see that (2.15) is 
verified for F‘" (k(m + 1)). It is easily shown directly that (2.15) holds for 
m = 1 so that the desired result follows by induction. 

We now consider the family $, of distribution functions obtained when w(\) 
and u(x) are defined by (1.14) and (1.15). 

THEOREM 2.6: If F(x |X) € Se, then (i) condition (1.3B) ts satisfied, and (ii) 


nn 
(2.21 rm = A) fl urte* du, 
) F'"'(2|X) Tim) Js u e€ du 


Proor: Any distribution in $2 is a gamma distribution with parameters de- 


termined so that the first moment is A, thereby satisfying (1.3B). The con- 
volution of such gamma distributions is well known (cf. [6]) to be given by 


(2.22) F™(2|\) = [ er oO # 
0 I'(nn)\" 


By making the change of variable u = x\/t we obtain (2.21). 


3. The Bayes decision rule. 

THEOREM 3.1: The function ¢,(t) given by (1.21) is finite and strictly increasing 
for te ES. 

Proor: The finiteness of H(A) = ft \ dG(X) insures the finiteness of ¢,(¢) 
for all te J{” with the possible exception of a set of u‘"’-measure zero since 


3.1) ¢n(t) = E{A|S, = 0, 


for all te J," — A, where A is the exceptional null set. Hence, for any fixed 
te IS”, we may choose t,,  ¢ J{” such that 4 S t S t and g(t), ¢n(te) are 
finite. Then tw(\) S max (tw(A), tw(d)) for all Xe J, and the finiteness of 
¢n(t) follows from the finiteness of the integrals in the expressions for ¢,(t,) and 
¢n(t2). Now choose t and 6 > 0 so that [t, ¢ + 8] C J)” and let 


2 ( » 

[ exp < feo() ~ n | uw’(u) dud dG() 

H,(2z) = —— \ a I Gime 
I exp { ta(2) = nf uw’ (u) au} dG(n) 
0 ¥ 


Then H,(z) may be interpreted as the distribution function of some random 
variable Z and we may write 


E\Z exp \6w(Z)}} — E{ZjEtexp {80(Z)}} 


(3.3) gn(t + 6) — galt) = Efexp {8(Z)}} 


It is intuitively clear and follows rigorously from the inequality on page 43 
of [7] that the right hand side of (3.3) is strictly positive for all 6 > 0 whenever 
G(X) and hence H,(z) are non-degenerate. Thus ¢,(¢) is strictly increasing and 
the proof is completed. 
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We now recall that the Bayes decision rule (1.20) is equialvent to 


:' 1 S, S t(n) 
°”. * Sr = F ' 
(34) a(S) 16 S, > t(n), 
where t(n) is defined by (1.24). 

THeoreM 3.2: If \) = sup {A:A Sc; G(A+-) — G(A — €) > 0, all e > 0} and 
Ai = inf {A:A = c; G(A + €) — G(A) > O, all « > 0}, and if t(n) is defined by 
(1.24), then 
(3.5) do S lim inf ce) <= lim sup <a) =A. 

Proor: Let J be an indicator function defined on the product of the space of 
all sequences of numbers {s,! and the real line by 


(1 ifs,/n—X as n> @ 
220 fo! = . es : 
(3.6) T({8n}, X) 10 otherwise. 


Given A = \, S, is a sum of conditionally independent identically distributed 
random variables with mean \. Hence, by the strong law of large numbers 


(3.7) E{I({S,},4)|A =N = 1, a.8., 
and this can be shown to be equivalent to 

(3.8) E{I({S,}, A)|A =} = 1, a.s. 
Thus 

(3.9) tS,/n > A} = BLBUT((S}, Aj | Ay} = 1. 
Furthermore, by a martingale convergence theorem (cf., p. 398 of [8]), 
(3.10) P{E(A|S,) — A} = 1. 
Therefore for all x which are continuity points of G(z), 

(3.11) P{E(A|S,) < 2} ~ G(x), 

and 

(3.12) 1S,/n < 2} > G(a), 


as n — ©. Suppose that lim sup,.. (t(n)/n) > A, . Then there is a 6 > O such 
that \, + 6 is a continuity point of G(x), and t(n)/n > %, + 4 for arbitrarily 
large values of n. Hence, for these values of n 

(3.13) P{E(A|S,) sc} = P{S,/n Ss t(n)/n} = P{S,/n Si + 4H. 
However, asn — « 


and for sufficiently large n 


(3.15) PLE(A|S,) Sc} S Git) + « 
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for arbitrary « > 0. If we choose « > 0 such that « < G(A, + 6) — G(Ai+) 
we are led to a contradiction and hence, lim sup,.. ¢(n)/n S ,. A similar 
argument establishes the other inequality of the theorem. 

THEOREM 3.3: If G(A) € Gi, then 


w”(c) G”(c) 


— Kn) = on + Cy ~ Fu" 


+ o(1). 
The proof of Theorem 3.3 will require a sequence of preliminary lemmas, the 
first two of which will also be used in the later derivation of an asymptotic ex- 
pansion for the Bayes risk. 

Since P{E(A|S,) — A} = 1, there exists a 6 > O such that (i) c + dis a 
continuity point of G(x), (ii) G(e + 6) < 1, and (iii) asn — ~ 


(3.17) P{E(A|S,) >¢ + 8 +1 — G(e +8) >0. 


But E(A|S,) S ¢,(nb), a.s., hence ¢,(nb) > c for all sufficiently large n. 


Similarly, ¢,(ma) < c for all sufficiently large n. Hence, referring to (1.22) we 
see that for all sufficiently large n, t(n), defined by (1.24), is the solution to 


o ra 
(3.18) I(n) = I (A — c) exp {((n)a(X) _ nf uw’ (a) an} dG(r) = 0. 
7 


The result of Theorem 3.2 suggests that if G(A) ¢€S,, then we should write 
t(n) = cen + y(n) so that 


(3.19) Sta) = I (X — ¢) exp {nh(d) + ¥(n)o(a)} dG) 
0 


where 


y 


» 
(3.20) ality Says / etal ak 
7 


This form of J(n) will be convenient for the application of results from the 
theory of the asymptotic expansion of integrals. 

LemMa 3.1: Let g(t) be any function integrable with respect to a distribution func- 
tion H(t), and let g(n, t) be a function, {t,| a sequence and c a number such that 
for all sufficiently small « > 0 there exists a 6 > O such that for all sufficiently large 
n, o(n, t) < o(n, th) — 6 whenever |t — c| = e€. Then for any fixed m 


c— 


(3.21) [ 


and 


g(t) exp {ng(n, t)} dH(t) = o(exp {ng(n, ,)}n™), 


m 


| g(t) exp {ng(n, t)} dH(t) = o(exp {ng(n, t,)jn-™). 
c+e 
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Proor: For all sufficiently large n, 


2 g(t) exp {ng(n, t)} dH(t) | 
(323) 


c—e 


< exp {ng(n, t,) — nd} [ | g(t) | dH(2), 


and (3.21) follows immediately. Expression (3.22) follows by the same argument. 
Lemma 3.2: If g(t) is any function which is four times continuously differentiable 
in some interval containing c, and if g/(c) = 0,g”(c) < 0, then 





ete j _r+l 2 r+ 
/ (t — c)' exp {ng(t)} dt = exp {ng(c)} |n 2 (- ) , 


; 1+ (-1) r ] ee g’""(e) (- 2 a 
Gta) +88 GOCa6 


(r+ 4\f1+(-1)™ 1431 (1 + (-1)" 9 \ 
(4 )( eT )+n (* 27 ~ 7G) 
( (0) » (e42) 
ie (o)) r (-+4) tithe A Bobbhig 6 (*) ,. 


4! g’’(c) } 
This is a standard result from the theory of asymptotic expansions and will not 
be proved here. A proof is outlined, for example, in [9]. 

The next lemma establishes the boundedness of ¥(n). 

Lema 3.3: If G(A) eG and t(n) = cn + y(n), then ¥(n) = O(1). 

Proor: The method of proof is to derive an asymptotic expansion for 7(n) as 
defined by (3.19) and show that the assumption that | ¥(n)|— ~ asn — ~ 
leads&to a contradiction. 

The expression h(A) + (¥(n)/n)w(A) is maximized when A = c + (¥(n)/n) 
and, noting that ¥(n)/n + 0asn— & by Theorem 3.2, it is easily verified that 
h(\) + (¥(n)/n)w(d) has the properties of the function g(n, t) of Lemma 3.1 
with ¢ = \ and t, = ¢ + (¥(n)/n). Hence choosing « > 0 such that dG(\) = 
G’(d) dd for X in (c — €, ¢ + €) we have from (3.19) by Lemma 3.1 





(3.24) 





e+e 
I(n) - | (x — ¢) exp {nh(d) + v(n)w(A)}G"(A) ad 


+o (exp {nh (c + Hm) + V(n)o (c + vie) “) . 


for any m 2 0. By the definition of §,, G’(A) = G’(c) + O(A — ec) for 
e—e<A<c+e. Hence 


(3.25) 
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c+e 
I(n) = G'(c) / (\ — c) exp {nh(A) + ¥(n)o(d)} dd 


+0 ( (x — c)* exp {nh(A) + v(n)w(A)} an) 


+o (exp {nh (c a vin)) + ¥(n)w (c as vin) nn) ‘ 


Letting r, = ¥(n)/n and t = \ — c — 7,, and expanding the exponent in the 
integrands about t = 0 we obtain 


«—T 


I(n) = exp {nh(c + rn) + ¥(n)ale + 7,)} [ere I ; (t + rp) 

-exp {f[nh”(c + rn) + ¥(n)w”(c + ta)] + E(n, 8} at| 

(3.27) 5 
+0 (/ (t + rn)” exp {f[nh”(c + 7.) + ¥(n)w”(c + 7,)] 


«—T, 


+ &(n, t)} at) + o(exp {nh(c + tr.) + ¥(n)wle + 7.) }n™) 


where | £(n, t) | < knt’ for some k > 0, all n, and all ¢ in (—e — ta, € — tr)’ 

Since r, — 0, changing the range of integration from (—e — ta, € — Ta) to 
(—«*, e*) for 0 < «* < € adds only terms of negligibly small order by Lemma 
3.1. Hence, 7(n) may be expressed in terms of integrals of the form 


(3.28) [ f exp {f[nh”(c + tr.) + ¥(n)w"(c + ta)] + (n, t)} dl 
for r = 0, 1, 2. Applying Lemma 3.2 to these integrals, regarding 


[nh”(c¢ + t.) + ¥(n)w"(c + 7.)] 


as the parameter which becomes large, and noting that the first terms in the 
expansions remain unchanged if either the upper or lower bound for £(n, t) is 
used, we obtain 


I(n) = exp {nh(c + ta) + ¥(n)w(c + 7.)} 


: Ge y(n) ( —e y 
(3.29) nl? \R" (6 + tn) + tnw"(e + Tr) 


+ oS) + (ee) | 


However, if | ¥(nm) | — © for any subsequence m — ©, (3.29) implies that 
I(n) # 0 for arbitrarily large values of n. This, however, is a contradiction, 
hence, ¥(n) = O(1). 
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These lemmas now permit us to complete the proof of the theorem. 
Proor oF THEOREM 3.3: Expanding 
(3.30) f(n, dr) = G’(A) exp ¥(n)w(A) 


and regarding h(A) as the function ¢g(n, ¢) appearing in 


about A = ¢ in (3.25) 
= \ and t, = c, we have 


Lemma 3.1 with ¢ 
ct+e 
I(n) = | (A — c) exp {nh(\) + ¥(n)w(c)}[G'(c) 


(3.31) + (A — c)(G"(c) + G’(e)w(c)y(n)) 


+ R(n, d)] dd + ofexp {nh(c)}n™) 


for any m 2 O, where 
o« 0 : 0 
(3.32) R(n,r) = (A — ce) | —f(n,e + O(n, A)(A — c)) — —f(n, c) 
or or 
for some 0 < 6(n,X) < 1. Now let 


(3.33) T(n) = / (A — c) exp {nh(A)}R(n, d) Ad. 


c—e€ 


For any arbitrarily small 6 > 0 we can find an n(6) > O such that | R(n, A) | < 
5(\ — c) for Xin (c — (6), ¢ + n(6)), since 


0/AAf(n, ) = [4"(A) + G'(A)P(n)w'(A)] exp [¥(n)o(r) } 


regarded as a function of \ is continuous at ¢ uniformly in n. Applying Lemma 
3.2 we have 


e+9(3) 
T(n)| = J R(n,dr)(A — c) exp {nh(r)} dd! + of(exp {nh(c)}n 


e—9(8) 
e+9(8) 
(3.34) < s | : (\ — c)* exp {nh(d)} dd + o(exp {nh(c)}n™) 
e—n(6) 
exp | nh(c)} 
= 60 (ce LaM ) ’ 


But 6 may be taken arbitrarily small, hence 


lim sup, (| T(n) |/exp {nh(e)jn- 3 >) 


a 


is less than an arbitrarily small quantity, so that T(n) = o (exp {nh(c)}n™ 
Using this fact and applying Lemma 3.2 to the terms of (3.31) involving (A — an 
and (A — c)* we have, 
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I(n) = n*” exp {nh(c) + ¥(n)w(c)} [ ere) (9) 


5/2 
(-35) T'(5/2) + (@’(c) + w(n)w'(c)G'(c)) 


2 3/2 
(-7) 13/2) + o(1) | a 


which yields 


h'"’(c) G” (c) ap ghee 


ch oe w”(c) G”(c) 
2h” (c)w'(c) *(c)w’(c) 


3.36 = | ol . 
( ) y(n) (ale)? ~ @ew"(e) + o(1) 
This establishes the expression (3.16) for t(n) if G(A) eG,. We now turn to 
consideration of G(A) € Ge . 

THEOREM 3.4: If G(X) € G2, 1 = G(le+) — G(le), and f2 = G(ugt+) — G(ue), 
then 


uw’(u) du In : Ee 


Q< Set tec ei ake ticiiee 
(3.37) i(n) =n w(ug) ant w(l¢) w(Ug) ae w(l¢) 


+ o(1). 


Proor: As in Theorem 3.3 we must find an asymptotic expansion for the 
solution t(n) of 


2 { » 
(3.38) I(n) = | (A — c) exp An )w(A) — n| uo’ (aw) du} dG(r) = 0. 
0 y ) 


For \ < /,, consider 


lg 
(3.39) va, n) = {® (ola) — olte)) + [ witedain. 


n 


Integrating by parts, and applying the mean value theorem and Theorem 3.2, 
we have lim sup,.. 11(A, m) S Ile (w(A) — w(le)) + lew(le) — Aw(A) — 
(le — A)w(A*) where A < A* < lg . Therefore, since w(X) is increasing, we have 
lim SUPp« T1(A, n) < 0 for each A < lg. Hence, for each A < lg 


( 


(A — c) exp 1 Hm)ao(A) — n | uw’ (u) d 





(3.40) (lg — ec) exp {t(n)atle) —n [ uw’ (u) an} 
A~-c 


= exp {nr,(A, n)} — 0, 
le -— ¢ 
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asn— ©, so that 





» 
tet (A — c) exp {H(n)a(a) nf uw’ (uw) an} 
(3.41) [ yoreraenyanie 7 +~of, 


; . i 
(le — c) exp {Hn e) - nf uw’ (u) au} 


asn— «. by the dominated convergence theorem. This, however, is equivalent 
to 


lot » 
[ (A — c) exp {(n)a(X) - n | ws’ (u) au} dG(r) 
(3.42) ae 
= {i(le — c) exp {H(n)e(te) ~ nf uw’ (u) au} (1 + o(1)). 


Similarly 


» 
(3.43) lim sup a2) (w(A) — wlte)) — / uw’ (u) au} <0 


for each A > ug, and 


Cs » 
/ (A — ce) exp {(n)u(X) _ nf uw’ (u) au} dG(r) 


a 


(3.44) is 
= (ue — c) exp {t(n)(ua) - nf uw’ (u) au} (1 + o(1)). 


Therefore we must determine t(n) so that 
le 
oiil(e — le) exp {i(n)al) _ nf uw’ (uw) au} (1 + o(1)) 
(3.45) 2 
ug 
= fo(ue — c) exp {t(n)a(we) = nf uw’ (uw) au} (1 + o(1)). 
7 . 
Taking logarithn.s, we obtain (3.37) as desired. 
4. Asymptotic characterization of the Bayes risk and the Bayes sample size. 


THeoreM 4.1: Jf F(x|X) & 5, (with k = 1) and G(A) €G., then the Bayes 
risk for fixed n and N is given by 


R(é*,n,N) =n ((s — r)E(A) + & — m+(n— a) [ (A — c) ag.) 


(4.1) +N (x E(A) + mr + (a, — 1) [ (A — c) agn)) 


+ (N —n) = aoe + BN) + (N —- n)o(2) 
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and the Bayes sample size is 
N, Ag 


(42) n*(N) = ni (@ — a)(a + eore(e)\" + o(N"*), 


Ag 
2aA G 


where 


iis} 4p ~ fa tate ne Goa [ (x — ¢) dG(a). 


Proor: By applying (1.20) to (1.19) we see that 
(4.4) R(é*,n, N) = n((8; — m1)E(A) + 8 — 12) 
+ N(nE(A) + 12) + (N — n)(@ — n)E{(A — c) L(A, n)}, 
where 
(4.5) L(\, n) = E{é*(S,)| A = NN = PIS, Ss t(n)| A =X, 


where {(n) is defined by (1.24). 

Letting r(n) = [t(n)], where [z] indicates the largest integer less than equal 
to x, we may write r(n) = cn + ¢(n), where g(n) = O(1) by Theorem 3.3. 
Now, noting that F(m|) ~0as\— b for m < b, we may apply Theorem 
2.5 to obtain 


L(A, n) = (r(n) + 1)r™(r(n) + 1) 


cs eo” d 
4.6) fp Ppa, exp{ —n[ cee) at : 


for values of n large enough so that 0 < r(n) < b. Hence 


b+ b 
El(A — c)L(A, n)] = (r(n) + 1)r'(r(n) + 1) [ ar | 


¢e™ t a du . 
‘Ca + Bor exp{—n | (a + Bu) au} oF Oe? 


g™ 


a te (n) r 
( (n) + 1)r ( (n) so of (a + Bt)» +1 


; a du Y 
-exp{—n [ sell (x — c) dG(a) dt. 


As before, let 


‘ adu 
(48) h(t) = ein (— 13) - l ae 
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Then 
(4.9) E[(A — c)L(A, n)] = (r(n) + 1)r“”(r(n) + 1)1(n) 
where 
b 
(4.10) a) « [ K(t, n) exp {nh(t)} df, 
0 
and 
( ie t 
4.1 r( 4 = puneumapegeaseareas — y _ 
1) K(i,n) = —— arr | (x — e) d@(r) 
Now 
; dK(c,n) _ a (si= ©) : 
(4.12) —=" ” + Bc)e™+1 “cla + Be) [ (Xr — c) dG(\), 
and 
mee) 
ee ” eo + Bc)e+1 
— *(¢7( (n)) — 4aBeg(n) + 26°" f° 
a alge (n) — g(n)) — 4aBep(n c = : 
| ere) + ae aay ‘Lo cag) |, 
so that 


(4.14) K(i,n) = Kle,n) + Ut —- ) , Kle, n) 42 ~ — ~ K( (c,n) + R(t, n) 
5 5a 


where, by an argument similar to that used in Theorem 3.3, 
R(t, n) = o((t — c)’) 


uniformly in n. Furthermore by Lemma 3.1 
cre 


(4.15) I(n) = K(t,n) exp {nh(t)} dt + o(exp {nh(c)}n™) 


c—€ 


for all m = 0. Hence, substituting (4.14) in (4.15), treating the remainder of 
(4.14) as was done in Theorem 3.3, and applying Lemma 3.2, we obtain 


s 
{ 


| 

a yy 9 1/2 J 
I(n) an SKS) (-;2 ) om. 
wn h®(c)) | 2 


(4.16) * r( 


») 2 \VPa%(c)  5(r° *7 
( aidan da - 
n E my ( ma) | 4! PRE) + | 


} Nolo 








aK (e,n) s ¥ ve) 1a K(ce, ™(— 2 |+ X( 
re (- 25) ( a JT ~ OE h®(c) 


From (4.8) we have 
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O56 i 
(C) = ~ at Be)’ 

(3) i 2a* + 4aBe 
WO) = Bla F Be) ’ 


h(c) = —6 (= + 3a°Be + ?) 


c*(a + Be)$ 
and hence 


seal 1 (Zacts + Bc) 


an 


e(n) 


) exp {nh(c)} Taya 


j | ll a(g'(n) + ¢(n)) , a +abe+ 6° 
(48) -[ @- dda) [1 + 1(aern +6 “Waele + Be) 


PNypetinwsire )+eQ) 
2a | (x — e) dG(A) 
(cia) Gis) 
'\a + Be a+ pe} ’ 


(4.19) exp {nh(c)} = 4 


| Cc cn 
(‘) exp |—cn}, 
\\a 

and from (2.16) we have 


(en + o(n) + 1)r” (en + o(n) + 1) 


| genteimr+t Tr (72 4. cn + ¢g(n) + 1) 


(en +9(n))1r (™) 


Furthermore by (4.8) 


en+¢(n) +1 


(en + g(n))!’ 


(oe ee 
(en + o(n))! (nb — en — g(n)—1)!’ 
(4.20) 


RO +ent+¢(n)+1/2 te 


oe a(y'(n) + o(n)) se a + pe + afe 


( 

peer a + Be) 
i. S TPS 
| 2ne(a + Be) 12nac(a + Be) 


—(en+ ¢(n)+1/2) 


~ ( ’ en+¢(n)+1 fn 
lexp {cn} a c 
cr 


‘3 (: _¢(n)+e(m)_ 1 1, (‘)), 
2ne 12en n 
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by Stirling’s formula, using the form Inn! = (n + 3) Inn + $ln2x — n+ 
1/12n + o(1/n). Combining (4.19) and (4.20), we obtain 


(N — n)(aq, — n)E{(A — c)L(A,n)} = N(am — 1) [ (A — c) dG(A) 





(4.21) + (aa ake FB) Que) + alr, — a) [ (x — c) d@(a) 
zna 0 
+ (a= Ala + BOO) 4  — 0) 0 (:), 


which upon substitution in (4.4) yields (4.1). 

It is easily verified that, as long as there are sets with positive probability 
on each side of c, the Bayes decision rule leads to an incorrect decision with 
positive probability whenever n is finite. Hence, for each finite n 


(4.22) E{(A — c)L(A,n)] > [ (x — ¢) dG(a). 
0 


By (4.21), however, E[(A — c)L(A, n)]— fo (A — c) dG(A), asn — ©. Suppose 
that the Bayes sample size n*(N) = O(1) as N — «. Then there exists an 
integer m such that 


(4.23) E{(A — c)L(A, n*(N))} > BE{(A — c)L(A, m))}. 


Referring to (4.4) and recalling that a, > r, by assumption we see that this 
implies that 


(4.24) R(6*, n*(N), N) > R(6*, m, N), 


for all sufficiently large values of N. This, however, contradicts the assertion 
that n*(N) is the Bayes sample size. Similarly, for any subsequence N; — ~ 
the assertion that n*(N;,) = O(1) leads to a contradiction. Hence n*(N) — « 
as N > o, 


Now for simplicity of notation we write (4.1) as 


(4.25) R(é*,n,N) = Acn+ BeN + Ca(N —n) € +o (:)) sasn— om. 
In order to characterize the Bayes sample size we must consider two cases. 
(Case i; Ag S 0): If Ac S 0, the risk is clearly minimized by taking n as 
large as possible (i.e., equal to N) since Cg > O by assumption so that 
Cq((1/n*) + 0(1/n*)) > 0 for large N since n*(N) — ~. 
(Case ii; Ag > 0): Let the Bayes sample size be written as 


(4.26) n*(N) = AN? + &(N) 


where 


" c.\" 
9 «0 tae 
(4.27) A (5) 
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and let 
(4.28) n(N) = AN*”, 


Now as N > ~, 


R(d*,n*(N),N) — R(d*, n(N), N) 
1 1 
_ Ao (N) + of N'E(N)) 
AN'? + &(N) ; 


which is positive for arbitrarily large values of N unless &(N) = o(N””*). If 
this expression is positive then the risk using n(N) is less than that using n*( NV) 
which contradicts the assertion that n*(N) is the Bayes sample size. Hence 
t(N) = o(N"”). 


THeoreM 4.2: If F(x|X) € 5: (with k = 1) GA) € Ge and Ag is defined by 
(4.3), then the Bayes sample size is given by 


N 
’ - Ag s9 
(4.30) n*(N) = { " K . 
p= ae 3 ninN + O(1), Ag>0. 


(The definition of K is lengthy and is contained in the proof. ) 
Proor: We rewrite (4.4) as 


R(6*,n,N) = N(r, E(A) + m2) + n((: — 1 )E(A) + 8 — 12) 


+ (N —n)(a — nf ff (A — c) dG(A) +[ (e — A)(1 — L(A, n)) dG(r) 


+ [ = e)LOn) agin)}. 
Now referring to (3.37) let r(n) = [t(n)] = Kin + ¢(n) = O(1) and 


Ue 
/ uw'(u) du 
li Te | 
Ky aay > a (clearly le < Ki < ue) 


For \ < K, we may apply a well known result of asymptotic expansion theory 
(ef. [10]) to (2.15) to obtain 


1 — LO, n) = (s(n) + 1(r(n) +1) (,)" i 


a+ 


Py * adu 
-exp{n (Kn, - [ aay a + o(1)). 


Similarly, we obtain for\ > Ky, 


3 an(K, — d) 
(4.31) 
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tes f (n) A nes 
L(A, n) = (r(n) + 1)r'"(r(n) + 1) (2, a) 


1 PN * adu S 
—— — oP" (xk, in(. >3)- I oon) (1 + o0(1)). 


Now, for \ < K,, K, ln (A/(a@ + BA)) — Soa du/(a + Bu) is increasing in i, 
and for \ > K;, it is decreasing. Hence we have 1 — L(A, n) = o(1 — Lile,n)) 
ford < lg, and L(\, n) = o(L(ueg,n)) ford > ug asn— ~. Let A* be which- 
ever of lc and we maximizes K; In \/(a + BA) — foadu/(a + Bu), and let ¢* 
be the weight assigned to that point by G(A). At this stage we discuss in detail 
the case where 


le '@ adu Ug ) “@ adu 
433) KiIn—*_ ~ [ pei, tei ( sas) [ f.. 
( — a + Ble b at Bu ; n( a+ Bu 
The case where equality holds can be treated in a similar manner and leads to 


the same Bayes sample size, as will be noted at the conclusion of the proof. By 
using expression (4.2) for (r(n) + 1)r“(r(n) + 1), we have 


R(*,n,N) = N(r, E(A) + re) + n((s — n)E(A) + (82 — r)) 


(4.32) 





+ (Wi nalle:-0 if (x — c) dG(a) 





\ 


[vis aa | (a + BKi)A*e*(A* — 











Ki(a + BA*) (A* — Ki)+/2eakila + a 
(4.34) A* a+ B8Ki, (a+ BKy )} ) 
exp {n (x, In K, _ 3 in(2 + oe) fr 8 = 0 
\ j 
exp \n (Kn? In — = + K, - *)}, 8 = 0] 
-(1+ 0(1))>. 
) 
Let 
a A*(a + BKy my ( (a + BKi)A*s*(A* — c) ¥. 
4.35) on Gee est eemeaiie 
(488) wha) (*- (a + Bd*) (A* — K;)+/2xaKi(a + BKj) 
\* a+ 8K,, a+ 6K, 
' Kiln g + B In A’ B <0, 


(4.36) —-—={ 
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and note that K > 0. We then have, from (4.34) 


R(é*, n, N) = N (nea) + Te + (a; ~ 8) [ (vA —c) ag(n)) 
0 


(4.37) +n (CG — rnJE(A) + &—rmt+(n-— a:) | (A — ©) dG()) 


+ (a, — 3)(N —n) — y(n OO) exp{ — n\ (1 + o(1)) 


as n— 2. Now, by the argument given in the proof of Theorem 4.1, n*{N) — « 
as N — ~. As in Theorem 4.1 we have two cases, (i) Ag S 0 and (ii) Ag > 0. 
Case (i) is treated exactly as before. Case (ii), however, requires a slightly 
different argument as follows: Let n(N) = Kln N — (K/2) nin N and write 
n*(N) = n(N) + &(N). Noting that 


N — n(N) 1 \ 1 : 

(4.38) ————— @X {-z n(N) _ —— as N — @, 
wae "Lec hc eee 

we have 


R(6*,n*(N),N) — R(6*,n(N), N) 


= ~ a) (N= 200 =) 
(4.39) Agi(N ) + (ay 8) ( ~/nN) + EN) y(n(N) + &(N)) 


exp{—f (n(N) + awy)} (1 + o(1)) + 01). 


If (NV) or any subsequence — +, the exponential term is bounded and the 
linear term becomes infinite. If £(N) — —, then 
N — n(N) — &(N) { 1 nish 
exp —= (n(N) + &(N)) 
Vn(N) + &(N) K J 


N-«0(N) f 1 oy 1 \ 
a Ta PM} m0 =p Ee} 


Recalling (4.38), and noting that y(n) is positive and bounded away from zero, 
we see that the exponential term of (4.39) becomes infinite and dominates the 
linear term. Therefore (4.39) is positive if £(N) or any subsequence becomes 
infinite, hence {(N) = O(1). 

If equality holds in (4.33), we observe that letting 


y(n) = (4 + a (+ BK, le $1(c ao le) ) 
Ki(a + Ble) (Ki — lg)~/2eaK,(a + 8K) 


(ye + BK)" (- (a + BRuus Mo f2( to =) ) 


(4.40) 


(4.41) 





Ki(a + Bue) 
leads to (4.37), and hence to (4.30). 
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TueoreM 4.3: If F(x|d) € Se (with 9 = 1), G(A) € Gi and Ag is defined by 
(4.3), then the Bayes sample size is given by 
N, Ag = 0, 
(4.42) n*(N) a ni? (@ - a,)cG’ (c) 
2Ag 


Proor: The method employed here is similar to that used in the proof of 
Theorem 4.2. By Theorem 2.6 


(443) 1-LQ,n) = fon ew f exp {nhit) ia vin)} a 


where h(t) = In (1/t) — (c/t). We note that this h(t) satisfies the conditions 
of the lemmas on asymptotic expansions and a simple calculation shows 


1/2 
) + o(N*”), Ag > 0. 


R(*, n, N) =n (a — r,)E(A) + & — t2+ (ry — a) [ (rv ~ e)dG00)) 


(4.44) +N (x E(A) + ro + (aq — 14) r (A — ce) ag(n)) 


+ (a — n)(N — n) (9 + 0 (:)) . 
2n n 


This now may be written as in (4.25) and exactly the same argument proves 
the theorem. 

TueoreM 4.4: If F(x|Xd) € S52 (with 9 
(4.3), then 


1), GA) € G2 and Ag is defined by 


N, A G 


IIA 


0, 
(4.45) n*(N) = 


to| 


In In N + O(1), Ag > 0. 


Proor: We define \* as being whichever of lg and ug maximizes —(K,/A) + 
In (1/A) and let 


: 1 K K 
(4.46) mt ie Ins ~ (& ~ 1). 


The remainder of the proof parallels that of Theorem 4.2. 


Acknowledgment. The authors are indebted to Professor James Hannan for 
helpful discussions concerning the results in the first part of Section 2. 
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OPTIMUM TOLERANCE REGIONS AND POWER WHEN 
SAMPLING FROM SOME NON-NORMAL UNIVERSES! 


By Irwin GuTTMAN 


McGill University 


1. Introduction and summary. We assume familiarity with the concepts de- 
fined in [1] and [2], where optimum §-expectation tolerance regions and their 
power functions were found for k-variate normal distributions. The method used 
is to reduce this problem to that of solving an equivalent hypothesis testing prob- 
lem. It is the purpose of this paper to find optimum #-expectation tolerance re- 
gions for the single and double exponential distributions, and to exhibit the cor- 
responding power functions. 

Let X = (X,,---, X,) be a random sample point in n dimensions, where 
each X; is an independent observation, distributed by some continuous probabil- 
ity distribution function. It is often desirable to estimate on the basis of such a 
sample point a region, say S(X,,--- , X,), which contains a given fraction 8 of 
the parent distribution. We usually seek to estimate the center 100 8% of the 
distribution and/or one of the 100 8% tails of the parent distribution. 


2. The single exponential distribution. The probability density function of the 


single exponential is given by 
1 
—~(z—p) 


(2.1) f(x) dz = A, ’ dx, x 
o 


IV 


p 


If we wish to construct tolerance regions S(2 , --- , X,) which have the ability 
to pick up sets on the right hand tail of (2.1), then a reasonable choice of “‘the 
measure of desirability”’ Q is 


1 
(2.2) 1Q.. = — 6" dy, yzu 


where a > 1. This clearly gives more measure to sets on the right hand tail of 
(2.1). The problem now separates itself into three cases. 

Case I. » known, o unknown. Without loss of generality, put 1» = 0. We con- 
sider the analogous hypothesis testing problem. [see p. 171 [1]]. Let X1, --- ,X,, 
Y be independent, each X; having the distribution (2.1), and let Y have the dis- 
tribution (2.2), all with » = 0. If a tolerance region is desired which tends to 
cover the right hand tail of (2.1), then the hypothesis testing problem has the 
form 


(2.3) Hypothesis: a = 1; Alternative: a = a > 1. 
Ifg=—n "2 2s 2; , then it can easily be verified that (Z, y) is a sufficient statistic 
Received October 7, 1958. 


1 Prepared in connection with research sponsored by the Office of Naval Research, while 
the author was at Princeton University. 
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for this problem. We now apply the invariance method expressed in terms of this 
sufficient statistic. Consider the group G of transformations given by 


(2.4) 


The function W = y/Z is invariant under this group, and is in fact the maximal 
invariant function. It is shown in Appendix 1 that the density element of W is’ 


(2.5) g(w; a) dw = a"n"* (na + w)"*™ dw. 


In terms of W, the hypothesis and alternative of (2.3) are simple, and we now 
apply the Neyman-Pearson fundamental Lemma. Then, the most powerful test 
function ¢(w) is based on the probability ratio 


arn" (ney + — 
n"t(n + w)~ ("t+ ’ 


or, as this ratio is a monotone increasing function of w, ¢(w) is based on W. 
Hence, the most powerful invariant test function is 
(1 if W> as 
(26) oW) =) 
\o if W < a 


where the ag are chosen to give the test size 8, that is 


(2.7) [ g(w:1) dw = 8B. 


Because the test does not depend on a , provided it is greater than 1, and be- 
cause it is based on the maximal invariant function, our most powerful invariant 
test function is minimax, most stringent and similar of size 8. From the definition 
of W and following [1], we have that the 8-expectation tolerance region which is 
minimax and most stringent is given by 


(2.8) S(a1, +++, in) = last, &). 


Values of ag for n = 1(1)20, 40 and 60 are given in Table I, for 8 = .99, .95, 
.90 and .75. The power of the procedure summarized by (2.8) is discussed in 
Section 4. 

Case IT. » unknown, o known. Let the known value of o be oo . The sufficient 
statistic is (tq) , y), where za) = min?., z; , each X; has distribution (2.1) with 
o = oo, and Y has the distribution (2.2) with o = oo . Under the group of trans- 
formations 


fart = IX) + a| ) 
(2.9) G=) , ae 


\ y y+a 


2 Inspection of g(w; a) will show that it is related to Snedecor’s F distribution with 
2, 2n) degrees of freedom, where W = af. 
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TABLE I 


Tolerance Factors ag for single exponential distributions (2.1), « known, o 
unknown; sample size n. 

















X 
a4 15 90 95 9 

| 

sical a | 
1 333333 1111 052631 | .010101 
2 .309401 . 108185 .051957 .010076 
3 .301927 . 107232 .051734 .010067 
4 . 298280 . 106760 .051624 .010063 
5 .296119 . 106478 .051557 .010061 
6 294690 106291 .051513 | .010059 
7 . 293675 . 106158 .051482 .010058 
8 292917 . 106057 .051458 | .010057 
9 . 292329 .105980 .051440 .010056 
10 .291860 .105918 .051425 .010055 
11 .291476 . 105867 051413 .010055 
a . 291158 .105824 .051403 .010055 
13 . 290889 . 105789 .051395 .010054 
14 . 290658 .105758 .051387 010054 
15 . 290458 .105731 .051381 010054 
16 . 290284 .105708 051376 .010054 
17 .290131 . 105688 .051371 .010053 
18 . 289993 . 105670 | .051366 .010053 
19 . 289871 . 105653 .051363 .010053 
20 . 289761 .105638 .051359 .010053 
30 . 289066 105546 .051337 .010052 
40 .288719 .105499 .051326 .010052 

31 


. 288373 - 105453 -051315 -010051 


the statistic W = (2a) — y)/oo is clearly a maximal invariant for the problem 
(2.3), and its distribution is given by 


pr ; e”’ dw if w>0O0 
a 
(2.10) h(w; a) dw = 

| on aes j 

(art e dw if w< 0. 


(This is proved in appendix 2)*. An analysis similar to that above shows that, 
for ability to pick up the right hand tail of (2.1), a minimax and most stringent 
tolerance region of 8-expectation is 


(2.11) S(a, “3 » tn) = [za — bso , oo), 


* Inspection of h(w; a) will show that it is a weighted combination of two densities that 
are simply related to x* with 2 degrees of freedom, where x} = anW for W > 0, and ex; = 
—2W for W < 0. 
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TABLE II 


Tolerance Factors bg for single exponential distribution wp unknown, o known, 
sample size n 





} 
| 
| 
| 
} 








1.60944 


. 305430 


-059446 


000000 
-010050 


coe mnoatk wn 


018349 
025318 
-031253 
036368 
-040822 
.044736 
. 230524 -048202 
. 233615 -051293 
. 236389 054067 
. 238892 056570 
. 254892 072571 .039039 
. 262989 -080668 ‘ -022290 
- 271152 088831 -008238 


where the bg are chosen to give the region size 8, that is the bg are such that 
bs 
(2.12) [ h(w; 1) dw = 8. 


Values of bs for n = 1(1)20, 40 and 60 are given in Table II for 8 = .99, .95, 
.90 and .75. The power of the procedure as summarized by (2.11) is discussed 
in Section 4. 

Case III. » and o unknown. The sufficient statistic is given by (2a), 8, y), 
where xq) = minj.; z;, y is the random variable with density (2.2), and s is 
given by 


(2.13) s=(n- 2 (aj — Za). 


Under the group of transformations 
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=cwyta 


ae R' 
(2.14) = ¢8 


1 le e (0, ~) 
Ya) = cXn) + a 


a maximal invariant is found to be 


(2.15) W = Tat matt 
The density element of W is 

STINT eFeS IR °>° 
n+ 1 dw 
(na + 1 {i — a+ 1I)(n — De w)”’ 
(This is proved in Appendix 3)*. An analysis similar to that above shows that the 


minimax most stringent tolerance region of 6-expectations, having ability to pick 
up the right hand tail of (2.1), is 


(2.16) k(w;a) dw = 
w <0. 





(2.17) S(a, +++, 2») = [ta — cps, ~}, 


where cs = (n' + 1)cg, and the cy are such that 


[2 kw; de = 2 


The values of cg are given in Table III for n = 1(1)20, 40 and 60 for 8 = .75, 
.90, .95 and .99, while the power function for (2.17) is discussed in Section 4. 


3. The double exponential distribution. The density of this function is given 
by 


(3.1) etre dx, —wcr<cenx 
so 
We discuss the case of « known, say uo . It is easily shown that if a sample of 
n independent observations be drawn from (3.1), that the sampling distribution 
of the statistic 


(3.2) T = >| Xi — wo! 


i=l 


has the density 


1 


(3.3) oT (n) 


ru 2 

‘Inspection of k(w; a) will show that it is a weighted combination of two densities 
that are simply related to an F distribution with 2, 2(n — 1) degrees of freedom, where 
(n+1)W = Fif W > 0, and naF = —(n+ 1)Wif W < 0. 
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TABLE III 


Tolerance Factors cg for single exponential distribution up and « unknown, 
sample size n 





16667 
.000000 .387426 
065238 194941 
106760 108976 
135330 .061617 
156148 | .032478 
171978 .013270 
184415 .000000 
194424 .010056 


. 202698 .018366 
209611 .025349 
.215486 .031293 
. 220539 .036418 
224931 .040881 
. 228784 .044802 
.232192 .048275 
. 235227 .051371 
. 237948 .054148 
.240400 056654 


- 256016 -072661 .018509 
. 263878 080751 .026609 
-271776 — .088898 — .034774 














Further, 7’ is sufficient for o. If the tolerance region is constructed so that it 
has ability to pick up the center part of (3.1), a reasonable choice for the ‘meas- 
ure of desirability’ is the measure Q, defined by 


1 
(3.4) dQ = : — dy 


2ace 


> 


where — * < y < ~ and ais such that 0 < a < 1. The analogous hypothesis 
testing problem can now be put in the form 


(3.5) Hypothesis: a = 1 Alternative: a = a, 9 <@ <_1L. 
We use the principle of invariance. The maximal invariant under the group of 
transformations 


| ) 
(3.6) jee (0, ~)} 
= cly a wo) | } 
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is the statistic W = |y — o|/t, and its density element is given by 


‘ na 

(3.7) p(w; a) dw ed (a + wt dw. 

(This is proved in Appendix 4)°. In terms of W the problem (3.5) is a simple 

hypothesis versus a simple hypothesis and clearly (t, y) is sufficient. Applying 

the Neyman-Pearson Fundamental Lemma, the most powerful invariant test is 

fifW sds 

(3.8) o(W) = ; 
lo otherwise 

The test does not depend on a (so long as 0 < a; < 1), and, because the test 

is based on the maximal invariant, it is minimax, most stringent, and similar of 

size 8. The ds are chosen to give the test size 8. Again following [1], we have the 

minimax most stringent tolerance regions of 8-expectation with ability to put 

up the center 100 8% of (3.1) is 


(3.9) S(2xy eae ee Ln) - [uo a det, Mo + dst}, 


where the dg are such that 


dg 


(3.10) p(w; 1) dw = 8B. 
0 


Values of ds for n = 1(1)20, 40 and 60 for 8 = .75, .90, .95 and .99 are given 
in Table IV. The power of (3.9) is discussed in the next section. 


4. Formulation of the power functions. Suppose sampling from (2.1), where 

A. Case 1. u known, o unknown. For this case, the solution of the correspond- 
ing hypothesis testing problem is given by (2.6). The power of ¢, P, (see p. 170 
of [1] and p. 774 of (2]) and hence of S is determined by the distribution of W 
under the alternative of (2.3). That is, we have 


(4.1) P= Pau.(W = a5) = [ g(w; an) dv, 
ag 


where g(w; a) is defined by (2.5), ag is given in Table I, and a; > 1. The power 
measures the ‘degree of confidence’ we have that S(X,,--- , X,) covers the 
right hand 100 8% of (2.1) when the desirability of covering this set is given by 


l —)(s—s) 
Q.(S) = fei - dx, + «. 
2 ag 


For example, if it is 99.5% desirable to cover the right hand 90% of (2.1), then 
a, = 21.01938 and the power is found by (4.1) using this value of a . Values of 
the power for the regions S (as given by (2.8)) are given in Table V when the 
desirability of the right hand 100 8% sets is .995. 


5 Inspection of p(w; «) will show that it is simply related to the F distribution with 
(2, 2n) degrees of freedom, where nW = aF. 





TABLE IV 


Tolerance Factors dg for the double exponential distributions mean and variance 
unknown; sample size n 





19.0000 98.9995 
3.47214 8.99998 
1.71442 3.64158 
1.11474 2.16227 

820564 1.51188 
647549 1.15443 
.534127 . 930696 
454215 .778278 
.394951 .668070 
.349283 .584892 


oo Whe 


~I 


- 134312 . 232847 . 313032 -519910 
122462 211528 . 283569 -467799 
112531 -193777 - 259155 -425102 
- 104090 . 178769 - 238599 - 389495 
-096825 - 165914 - 221055 -359356 
-090507 154782 - 205908 333521 
-084964 , - 192700 -311134 
-080060 1 -181080 - 291549 
-075691 ‘ - 170780 -274275 
.071773 al - 161586 - 258925 








.047294 .079775 -105014 165914 
035265 -059254 -077770 - 122018 
-023374 -039122 -051196 -079775 








TABLE V 


Power of 8-expectation tolerance regions, [ag%, ©), when sampling from the 
single exponential distribution, sample size n 


Measure of Desirability = .995 


21.01937897 10.23299086 2.005037823 





95 99 


9942255 9947417 | 9948830 .9949873 
.9947577 9949156 | .9949614 .9949958 
9948565 9949496 9949769 | 9949975 
. 9948982 . 9949642 . 9949837 .9949982 
.9949289 | .9949751 .9949885 |  — .9949989 
.9949527 .9949839 9949928 | .9949994 
.9949772 .9949924 | 9949968 .9950000 
9949897 .9949968 .9950000 | —-.9950000 
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IRWIN GUTTMAN 


TABLE VI 


Power of 8-expectation tolerance regions, [zi1) — bgoo , «), when sampling from 
the single exponential distribution, sample size n 


Measure of Desirability = .995 


$7.39245356 


21.01937897 | 2.005037823 


.9914372 .9909171 . 9933444 
.9942255 .9937556 ; .9942980 
.9946996 9943447 ; 9945578 
.9948414 9945995 : .9946791 
.9949202 9947892 j 9947744 
.9949637 .9949042 : .9948512 
.9949907 .9949755 : .9949305 
.9949977 9949938 9949712 


SBaASrawe 











TABLE VII 


Power of B-expectation tolerance regions, [z.1) — css, ~) when sampling from the 
single exponential distribution, sample size n 


Measure of Desirability = .995 


57 .39245356 21.01937897 2.005037823 


| 
| 
| 


.9930295 .9930122 9940120 
.9941230 9940379 9944568 
9944932 9943908 .9946278 
9946794 9945693 9947184 
.9947891 9946772 .9947744 
.9949021 9948218 9948512 
9949719 9949525 9949305 
9949912 9849881 9949712 





Case 2. » unknown, o known. An analysis similar to the above shows that the 
power of (2.11) is given by 


bs 
(4.2) P, = Pan. (W S bp) -[ h(w; ax) dw, 


where h(w; a) is given by (2.10) and bg is given in Table IT. Values of (4.2) for 
the regions (2.11) are given in Table VI. 
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TABLE VIII 


Power of 8-expectation tolerance regions, [uo — dgt, wo + det] when sampling 
from the double exponential distribution, sample size n 


Measure of Desirability = .995 


261648041 434587989 565411999 869175979 


9197804 | ! .9711014 
.9707346 .9847458 
.9815020 , 

.9858373 
9888911 
9911096 
9931575 
.9941067 








| 
| 
| 
| 





Case 3. » and o unknown. Proceeding as above, one finds that 
, “B 
(43) P, = Pan.(W S cs) = / k(w; a) dw, 


where k(w; a) is given by (2.16) and the values of cs can be found from Table 
III using the relationship cs = (n™' + 1)c3. Values of (4.3) for 99.5% desira- 
bility of the right hand 100 6% sets are given in Table VII. 

B. The Double Exponential Distribution. As before, the power of the regions 
(3.9) is given by the power of the test (3.8) under the alternative hypothesis of 
(3.5), that is by 


dg 
(4.4) P= Pan(W S ds) = | p(w;ar) dw 


where p(w; a) is given by (3.7) and dg is tabulated in Table LV. Values of (4.4) 
are given in Table VIII. 


5. Acknowledgement. The author wishes to express his thanks to Professor 
D. A. 8. Fraser of the University of Toronto for valuable discussion and en- 
couragement, and to Mr. A. Cseuz of the University of Alberta for doing the 
computations. 


APPENDIX 
A. 1. Derivation of (2.5). To restate, the distribution of Y is given by (2.2) 
with » = 0. Define X = n™'}>°?_, X, , where the X, are independent observations 
from (2.1), with » = 0. It is well known that the density element of X is 


1 n” ete int ao 
~ e* £ dt 
o T(n) 
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Hence the joint density element of # and y is 


n ny 
n ie s 1 


» * ae 


—— e 1é dt 
ao" (n) ; ae 


We make the transformation w = y/Z,2 = y. (The absolute value of the Jacobian 
is z/w.) The joint density element of W and Z is 


n nz < 
" n —_ — Zz 
—— _p Wp ae _ 
g(w, z) dw dz ao"IF(n) ¢ € —; dw dz. 


n nti 


On integrating out z we have g(w; a) dw = a"n"™'(na + w)“"*” dw. It is 
easily verified that g(w; a) is a probability density. 
A. 2. Derivation of (2.10). Here the distribution of Y is given by (2.2) with 

o = oy. Define Xq) = minj., X; , where X; are n independent observations from 

(2.1) with o = oo. It is well known that the density element of X,) is given by 
nm = dz) 
do 

Let s = n/oo(%a) — w) and z = (y — w)/aoo. Then the density elements of s 

and z are respectively e “ds and e “dz, and their joint density element is 

e * * ds dz. Make the transformation 


8 8 
w=--—az and t=--+ az. 
nm n 


Note that w = (2ry) — y)/oo. The absolute value of the Jacobian is n/2a. Hence 


n 


h(w, t) dw at = 3 ¢ oe) tees 


Integrating out ¢, 


( — 
oet* dw ifw > 0 


h(w; a) dw = 


w 


n - i 
e* dw if w < 0, 


| nex + 1 
and it is easily verified that h(w; a) is a density. 
A. 3. Derivation of (2.16). Using A. 2., it is easily seen that the density ele- 
ment of z = (xa) — y)/(1 +n‘) is 
n+ 3 -**", 
o(na + 1) 
aA eee 
Pre > oe OG nao f 0, 
Sen dn th £us< 


where o is now unknown. The density element of 


s = (n — 1) yea (24 — tay) 


dzifz>0 
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(" — iy 1 , 
ag T(n — 1) 


({3], p. 54). Hence the joint density element of z and s is 


n+1 (n—1\" s* -O™r(®), ' 
ett (8S) t= 1° ds dz ifz>0 


is given by 


and 


n +1 a ton n—l 3" ——— = ; 
ath) Tin — 1) ° e ds dz if z < 0. 


Making the transformation w = z/s andr = z (the absolute value of the Jacobian 
is r/w’), the joint distribution of w and r becomes 


( n+1 (* - yo Mrs CG} dw dr 


| o(na + 1) o w'T(n — 1) 


oC 


ifw>d 
k(w, r) dw dr 


a(na + 1) o w"T(n — 1) 9 PP ET A 


| 
"| “ + 1 (" ad , Pe _(n-Dr (n+1)r 


if w < 0. 


Integrating out r 


( n+ 1 dw 
ina + 1 {1 + (n+ 1)(m — 1)—w]" 


k(w; a) dw = ¢ 
in+1 dw 
lna + 1 fl — a+ Inn — 1a" 
and it is readily seen that k(w; a) is a density. 
A. 4. Derivation of (3.7). Let Y have the distribution (3.4) and define 
T = >°21 |X; — wol, V = |¥ — wol, where each X; is distributed by (3.1), and 
so T has the density (3.3). It is easily shown that V has the density element 





ifw> od 


ifw < 0, 


= e * dy, v0. 
ag 


The joint density element of V and T' is then 


ie = e "6 *'™ dt dv 

ao"*! T'(n) , ; 
If we let w = v/t and z = ¢t (the absolute value of the Jacobian is z), the joint 
density element is 


1 n zw 


Zz ut a 
ee * dwdz. 


p(w, z) dw dz = aot T(n) 
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Integrating over z 


n 
a 


n 
p(w; a) dw _— (a + w)"*! 


dw, 


and it is easily verified that p(w; a) integrates to 1. 
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PROPERTIES OF MODEL II—TYPE ANALYSIS OF VARIANCE 
TESTS, A: OPTIMUM NATURE OF THE F-TEST FOR 
MODEL II IN THE BALANCED CASE! * 


By Leon H. Herspacu 


College of Engineering, New York University? 


1. Summary. A distribution analogous to the canonical distribution used in 
testing the general linear hypothesis is developed for Model IT analysis of vari- 
ance for balanced classifications. As in the case of Model I analysis of variance, 
this standard distribution exhibits the sums of squares going into the analysis 
of variance table. By use of the standard form it is also shown that (i) all exact 
F-tests used in testing hypotheses based on balanced multiple classifications 
determine uniformly most powerful (u.m.p.) similar regions although they are 
not likelihood ratio (L.R.) tests, but (ii) in the balanced one-way classification, 
for all practical purposes, the test is an L.R. test, and is u.m.p. invariant. An 
exact F-test exists when we have a sum of squares, S, distributed as (k + oo) 
times a chi-square variate, where k > 0, independently of S., which is dis- 
tributed as k times a chi-square variate. The test is then to reject the hypothesis 
that oo = 0 whenever S,/S, is greater than some suitably chosen number, c. 
As a corollary to property (i) it is shown that “of all invariant tests of ob = 0 
against oo > 0 whose power is a function of 06/(k + 09) only, the test S,/S2 > c 
is most powerful, providing S; and S; , as defined above can be found.” 


2. Notation and terminology. We use the notation p(x) for the probability 
density function (p.d.f.) of the vector-valued random variable, X, which depends 
on the vector-valued parameter @ ¢ 2, where 2 will always represent the un- 
restricted parameter space. This notation is generic so that p may not be the 
same density each time it appears. The difference in functional form is indicated 
by the change in variable. The actual form will always be clear from the context. 
This same generic notation will be used for constants; c will usually be a constant, 
not necessarily the same one each time it appears. It will be clear from the con- 
text when c is not a constant. The subspace of 2 specified by the hypothesis 
being tested will be denoted by w. No confusion will be caused when dealing with 
the hypothesis H: @ ¢ w if we sometimes speak of w rather than H as the hypothe- 
sis. By a test of an hypothesis we mean any measurable function ¢(2) with the 
property that 0 S y(z) S 1. When X is observed to take on the value x one 
rejects H with probability ¢(2). 


3. Introduction. In Model II (components of variance model) analysis of 


Received May 8, 1958; revised January 15, 1959. 

1 Part of the author’s doctoral dissertation submitted to Columbia University [3]. 
2 Work partially sponsored by the Office of Naval Research. 

3 Scientific Paper No. 6a, Engineering Statistics Laboratory. 


939 





940 LEON H. HERBACH 


variance, the following stochastic model is assumed in the case of a two-way 
classification with K observations per cell: 


X ix — utet +e + et? + Cz, 
(3.1) 


Les dpi gp i,---,d;3 k =1,---K, 


where X;; is the kth measurement on the MG pth cell, u the main effect is as- 
sumed a constant and the “components” e/, e}, e¢/’, e:je are normally and inde- 
pendently distributed (NID) with means zero and variances o2, 05, o2b, o: 
respectively. These will be referred to as the Model II assumptions. If, as here, 
one has the same number of observations in each cell, the classification is called 
balanced, otherwise unbalanced. 

The corresponding model for Model I (general linear hypothesis model) 
analysis of variance is given by (3.1) where it is now assumed that in addition 
to u, et, ef and et; are also constants, and e;;, are the only random variables, and 
these are e NID(O, g *). Furthermore it is usually assumed that }>; e¢ = 5°; e? = 

ey. = >; e et; = 0. These equations for the effects may be assumed without 
loss of inliealing in Model I but would violate the assumed independence in 
Model IT. The usual theoretical procedure in setting up any Model I hypothesis, 
say Hy: a; = 0 (i = 1, --- , I), is to find the likelihood ratio test of the hy- 
pothesis. This gives the usual F-test. In addition to having the backing of the 
intuitive appeal of the likelihood ratio test, the resulting F-test has been shown 
by Hsu [4], [5], Wald [14], Wolfowitz [16] and others to have many optimum 
properties. 

Analysis of variance, to many, also means a technique of calculating the analy- 
sis of variance table given in Table 3.1, where 


= IJK (X... — »)* 
JK (X,.. — X...)* 
IK», (X.;. — X...)? 
KUd (Xi: — Xe. — 3) 
Lud (Xie — Xij.)*. 


TABLE 3.1 
_ Analysis of Va ariance - Table for a Balanced Two- “way Classification 





| 


| E (mean 
d.f. s.S. m.s. square) 


1 Si | Si Ai 
(I — 1) | Se 32/ v2 | Xs 
(J — 1) | Ss 5 | As 
(I — 1)(J — 1) Sa S | Ae 
IJ(K — 1) As 
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The mean row and E (mean square) column do not always appear in the usual 
analysis of variance table and will be explained later. The statistic used in Model 
I to test Ho is (v5S:)/(v2Ss), which is distributed as F with ». and vs degrees of 
freedom. 

The procedure in Model II is to set up the analysis of variance table that is 
used in Model I, and then to add a column which gives the expected mean 
squares. One then notes that the five mean squares are always independently 
distributed and that S;/(»,;) is distributed as x’ with »; degrees of freedom. 
Using the fact that the expected mean squares are 


Mi = of + Kow + JKos + [Koy 
ho = 0: + Kow + JKo2 

As = of + Kom + [Koy 

hs = 02 + Koa 

As = oe 


we have, under the hypothesis of no Model II A effect (Ho:o2 = 0), that A. = 
A, and (v4S2)/(v2S,) is distributed as F with v. and » degrees of freedom. This, 
in fact, is the F-statistic used in testing Ho. All exact F-tests used in Model II 
are obtained by taking ratios of mean squares which have equal \’s under the 
hypothesis, whenever there are equal \’s. No attempt has been made previously 
to show that these tests are optimum or to even show they are likelihood ratio 
(L.R.) tests, which they sometimes are not. Two of the purposes of this paper 
are to derive optimum properties for some tests of Model II hypotheses and to 
show that in this model the analysis of variance table can be obtained without 
borrowing it from Model I. 


4. Some useful lemmas for a certain matrix. The following n X n matrix 
plays an important role in what follows: 


(a+b a 
a a+b 


f 


a a a:--a+b] 


where a and b are either scalars or square matrices of the same size. Since A is a 
function of a + b and a only, the notation 


(4.2) A = (a+ B\a)* 
shall be used. 


‘It may be noted that for a, b scalars, A = bg + ag*, where g is the unit matrix and 
g* is the matrix with all elements unity. 
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We shall make use of the following two lemmas: 
Lemma 4.1: If A is of the form (4.1) then the determinant of A satisfies 


|A| =|b+nal|b|"> 


where | D | means the determinant of the matrix D. 

The proof is exactly the same as that given by Wilks ({15], p. 109) for the 
case in which a and b are scalars. 

Lemma 4.2: If a, b are real numbers with b(b + na) # 0, 


a teed, 
~ (b+ na)b 
5. Standard form for the balanced two-way classification. Consider the two- 


way Model II classification with K observations per cell, given by (3.1). Let the 
transpose of the observation vector X be 


(4.3) ({b + (n — 1)a]\—a). 


X’ = (Xin » Xon ’ Xsu ig Oe Xm ; Xia, Xon ca 2.2 ee Xr . oa 
(5.1) Xin, Xen ’ Xan eo? Xin ; Xi, Xa ’ X12 greeny X ne ; 
X12 , X 222 a eae Xm oe Da Xk, Xosx ’ Xsux 1 eee X rik ), 


that is, the triplets 7, 7, k are ordered so that 

(7, j, k) precedes (7’, j, k) s&s S 

(7, 7, k) precedes (7’, 7’, k) tf 3<:j', 

(1, 7, k) precedes (7’, 7’, k’) fs <F. 
Let 2 be the covariance matrix of X and N = JJK. Then it is known that 
there exists an orthogonal matrix D with the following properties: (a) its first 
row is N *8’, where 
(5.2) ’ = (1,1,---,1), 


a 1 X N row vector, (b) the covariance matrix of Z = DX isD } D’ = A = 
diag. (Ai, «+: , Aw), Ac > O and (c) the d’s are the roots of the characteristic 
equation | 2 — Ag | = O, where ¢ is the identity matrix. 
We shall now find these \’s in terms of 03, 0%, oa and o, . Now 

|  —As| = | @\B| 

KXK 
where 

a= (A 1\Bi) 


® = (A,\B2) 
IxXJI 


(5.3) A; = (03 + ob + oa + oc — A\os) 
B, 


(o:\0) = By 
= (2 + 03 + o2\o5). 
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It should be noted that 2 is an N X N matrix of scalars, but isa K & K ma- 
trix when the elements are submatrices (@’s and @’s). Repeated use of Lemma 
4.1 yields 


| — As} = |(@ — B®) + K@||@ — @|*" 
| Ai + (K — 1)A:\KB,|-| A; — As\0|*™ 
| Ay + (K — 1)A2 + K(J — 1)Be|-| Ai + (K — 1)A; 
— KB,|’"| A, — A: |“ 
D, « D; - Ds, say, 
where 
D, = | K(oa + 06 + oa) + o¢ + K(J — 1)oa — d\Kas | 
| Koss + oc + JKoa + [Kos — d|| Koos + oc + JKoi — |" 
D; = | K(oa + 06 + oa) + o¢ — ¥ — Koe\Kay |" 
| Kos + [Kos + 02 — |" *-| Kom + os — [OOO 
D; = |os — \0|""" = |e. —r(7*. 


Therefore the values of the N = IJK characteristic roots are 


of + Kow + [Kos + JKoz = \., say with multiplicity 1 


o: + Kow + JKo; =», say with multiplicity (J — 1) 
o: + Koa + [Ko = ;, say with multiplicity (J — 1) 


o. + Koa = \,, say with multiplicity (7 — 1)(J — 1) 
Oe = Xs, say with multiplicity //(K — 1). 

These \’s are the same as the ones defined by (3.3). The orthogonality of D, 
the property that the first row of D is N~'s’ and the fact that EX, = » imply 
that 


(5.4) EZw 7 V/ Nu 
EZ ix = 0, for (4, J, k) x (1, f 1). 


After the dissertation [3] was defended but before this paper was prepared, 
Dr. Howard Levene called the author’s attention to the work of Nelder [11] 
whose method for obtaining the latent roots of a special case of matrices of the 
form (4.1) can be generalized to find our eigenvalues. However, it is felt that 
the algorithm, using Lemma 4.1 is more convenient, especially when higher 
multiple classifications are treated. 

Let ¢ = EZ, the vector given in (5.4). We have shown that the vector vari- 
able X defined by (3.1) which is distributed as N(yué, 2) where J = (@\@) 
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and @, @ are defined by (5.3) when \ = 0, may be transformed by an orthog- 
onal matrix to yield a variable Z which has the following density: 


rp Jal? —H[(2— 1(s— [al 
(5.5) (Q—)wn (2-1) A e—p)) = Gaia exp { - ; + 7+ t+ +5 >) 


where A; , --- , As are given by (3.3) and 


% = (4 — VNx)’ 
I 
2 
> z Zi 


t—_2 





J 
(5.6) on - Zin 


j=? 


4 = De Dy iil 
K 
&= 2d an 


tl j=l kon? 
The reader should note that s; is not a statistic, since it contains yu. This par- 
ticular expression for s; is used because of the symmetry it gives to (5.5), which 
we shall refer to as the Model II standard form of the probability density for 
the case of a balanced two way classification with K observations per cell. 
Note that (3.2) implies that 


(5.7) A = Ae tAs — Ma, 


a fact which will be used later. 
For completeness the Tang canonical form [13] of the joint density of (3.1) 
when the Model I assumptions are made will now be given. Then 


X ix: NID( Mij; a’) 
where 


A B AB 
Big = ME + ej + Ci; . 


Tang showed that there exists an orthogonal N X N matrix D whose first row 
is N~'8’ for which Z = DX has the density, 


1 1 
(24)¥?o% aque | Fo? 5{ (em — VNu)’ 


(5.8) + dX (sa, — af)* + 2D Ga. ~ ey 
+ EE Gn - ot) + EOD aah] 


B B ° ° ° 
where the af(a}, a4”) are linear combinations of the ef(ej, e#;’) such that 
° : B B 
at (a), at’) are zero if and only if all ef(e;, e7;’) are zero. It should be noted 
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that in Model I one transforms to change the means while in Model II one 
does so to change the covariance matrix. It can be shown [3] that the same 
orthogonal transformation, D, could be used in both models, so that we are 
justified in using ‘he same letter Z in (5.6) and (5.8). Also, Tang showed for 
Model I, that S, , --- , Ss as given in (3.2) and (5.6) are the same, except that 
(3.2) expresses the S’s in terms of original variates while (5.6) does so in terms 
of transformed variates. Since the transformations are the same in both models 
this shows that the sums of squares in (3.2) and (5.6) are the same in Model 
II. From this one can argue that the standard distribution (5.5) can be obtained 
from the analysis of variance table above since it is known that all rows are 
independently distributed. This was not done because we do not yet know 
what the properties of the tests based on Table 3.1 are. We propose to get the 
table, tests and optimum properties of the tests from (5.5), the standard form, 
which is easier to handle than the density of X, although all tests of hypotheses 
based on X can be transformed to tests based on Z. It should be noted that 
(5.5) is the density of Z although it is written in terms of the s’s. From (5.5) 
and (5.6) it is clear that X = N~'Zin, S:, S;, S, and S; are independently 
distributed as a normal variate and four multiples of x’ with (J — 1), (J — 1), 
(I —1)(J — 1) and JJ(K — 1) degrees of freedom respectively. In the sequel 
we shall use this latter joint density, namely, 


J EZ : 2d 
5 ¢ yr = —_ os naicitiieionl aie =f. 
(5.9) PCE, #0) %, 8, &) (*) ras [ 1 | I] (2d,)"*7P'(v,;/2) 
Densities (5.5) and (5.8) or (5.9) and (5.8) show clearly that under the 
hypothesis of no A effect, Ho and Ho respectively, S/S, and S./Ss respectively 
are distributed as a multiple of F with the degrees of freedom indicated by the 
number of standard variates in each S. These are the statistics indicated at the 
end of Section 3. All F-tests used to test the non-existence of certain effects can 
be obtained this way. 


6. Uniformly most powerful similar test for testing non-existence of main 
effects in the balanced one or two-way classification. This section will be de- 
voted to showing that the F-test is the u.m.p. similar test for testing w:02, = 0 
against 2 — wo, > 0 when one has a balanced one or two way Model II classi- 
fication. Although the hypothesis to be tested is actually 02 = 0, 05 = 0, 0% = 0 


and o: > 0 we defer to the usual practice of not explicitly stating the other in- 
equalities when no confusion will result. A similar statement can be made in 
regard to the alternative hypothesis. In the two-way classification 


2 ={0|-- <p <am; 
(6.1) 0<s SEMSEMEM=HMtEM HM K @; 
MSA SM! 
(6.2) w= {@|-“x“ <c<yp< @; 0<A\,SN=ARSA=A < ~} 
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where 6 = (yu, 1, Ac, As, Aa, As). Since zn = ~/ NZ it can be seen from (5.5) 
that a sufficient statistic under w, for the two-way classification is 


(6.3) T = (X, S;, Ss, U) 
where 
(6.4) U = Se +. S, 


We first prove the following 
THEOREM 6.1: For the standard distribution of the two-way Model II classifica- 
tion (5.5) the statistic T defined by (6.3) is complete on w, where w is determined 
by the hypothesis 02 = 0 and is defined by (6.2). 
Proor: By the definition of completeness [7] we need to show that 
Ff ( oe 0 


w 


implies f(t) = 0, (a.e.). For 6 ¢w, we have, using (5.9), 


(6.5) B.f(T) = (6) [ [lf] f(t)g(t, Xs) h(t, @) dss du ds, dz 
0 J 0 0 


where 

| oe) cet 2 
(66) g(t,A3) = exp ae ‘I ene @ f 

23 
and 
( N pt mn , : \ 

(6.7 h(t,0) = exp, —“ — & uu _ & | 

) ) I \ As 2A3 24 2ds } 


Let S} = S; + NX? and T* = (X, S}, S;, U). Changing the variable of in- 
tegration in (6.5) to t* one gets for @€ a, 


Ey f(T) = e(8) if rrr f*(t*)g*(t*) 





(6.8) z is \ 
‘ N pt 83 u 85 @ 50 
exp ‘hit i tel ds; duds; dz 
where 
\ . * =2 
i #(#) — JJ(t) ifs; > NZ 
(9.9) Fe") : otherwise 
and 
9 at eebeet 5-8 
(6.10) g*(t*) = (83 —N#)? u 2 & ? 


By the unicity property of the quadruple Laplace transform, (6.8) is identi- 
cally zero for @ in a non-degenerate interval only if 


(6.11) f(t*)g*(t*) =0 (ae.). 
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Now g*(t*) # 0 (a.e.). Thus (6.11) and (6.9) imply that f(t) = 0 (a.e.) and 
the theorem is proved. 


Let V be defined by the following 1:1 transformation 
U = S.+ Sy => UV 
0 
V = S./(S2 + S4) S,= U(1 — V). 


Sines, as can be seen from (5.5), (X, S2, Ss, S¢, Ss) is sufficient under Q then 
W = (T, V) is also. Using (6.12) and the density of (S:, S,) given by (5.9) 
we have for the density of (U, V), 


(6.12) 


. eas a ee (5-5) om 
(6.13) peluyv) = c(@)u 2% vw? (L—v) 2 @e 2\%2 MA e@ Mm, 


But under w, A» = Ay and (6.13) becomes for % ¢ w 


Votvg—2 u vq—2 vg-2 


(6.14) po,(u,v) = c(O)u % e %y 2? (1—v) ?, 


which shows that U and V are independent under w. Since (5.9) and (6.4) 
show clearly that (X, S;, Ss) and (U, V) are always independent this means 
that T and V are independent under w and we have 


Pe—! v4—2 
(6.15) po(v|t) = p(v) =e ? (l—v)? , for @&wa. 
Now we are in a position to prove 
THEOREM 6.2: The F-test, which rejects the hypothesis when V is greater than 
some constant, determines a uniformly most powerful similar region for testing 
wio, = 0 against Q — wio, > 0. 
Proor: We make use of the fact [7, p. 317] that if 7' is a sufficient statistic 
for 6¢w, and if T is complete on w then all similar tests of size a, 


Exg(W) =a, ew, W = (T,V) 


have Neyman structure [12] with respect to T’, i.e. satisfy 


(6.16) ott, v)pe(v | t) dv = a, (a.e.) for all Oe. 


J 


Subject to this we wish to maximize the power at a particular alternative 
6,¢€Q2 — w; that is we desire 


I; [ots v)pe,(v | t) in| pe, (t) dt = maximum. 
\ 


Using (6.14) and (6.15) these conditions become 


_9 


. 7 "4 
(6.17) ef g(t,v)v ? (l—v) ? dv=a 
0 
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and 


ef. een [ e(,»m(o 10 av po, (0) at = max 


respectively, where 





votvg—2 2-2 ¥4—2 uv/ il 1 u 

OT et aad ate ae 

i(v|t) = $$ $$ _______ 
Peio| pel) 

This will be achieved if for each value of ¢ we have 


Votre? ve—2 ¥4—2 ue 


’ Ss os oq? (5-%)-%& 
(6.18) ef g(t,v)u 7 v2? (L—v) 2? e@ 2\¥2 M/Z 2% dy = max, 
0 
where we recall ¢ = (2, 83, 8, u). But, finding for fixed t (and a fortiori for 
fixed u) a test o(t, v) satisfying (6.17) and (6.18) is a problem whose solution 
is given at once by the fundamental Neyman-Pearson lemma to be ¢(t, v) = 1 
when 


votrg—2 ¥Q—2 ¥4—2 (5 a ) u ¥Q—2 ¥4-2 
cu? ov? (l—v) 2? € 2\¥2 MF Me Sey 2? (1 —v) 2 








_2 ~-~) 
or e 2\2 M/ > c(6,,t) 
or e” > c(@,t), k>0. 
or v> c(A ? t). 
The “constant”, c = c(@, , t) is determined by (6.17) or 
psiigein dg rant ran? 
a(2,%) J." 2 (l—v) 2? dv=a. 
2°32 
Consequently c is independent of both @ and ¢, 
82 
t,v) = 1 when v= >ec 
e(t, v) <x 


and the usual F-test is u.m.p. similar.* 

Of course, Theorem 6.2 was proved only for the balanced two-way classifica- 
tion, but using the standard form in the next section it can be proved in the 
same way for the balanced one-way classification. 

To show where the proof breaks down when applied to testing the hypothesis 


5 In commenting on an earlier draft of this paper, Dr. Werner Gautschi pointed out that 
in testing w: 04 + 0% = 0, the T corresponding to (6.3) namely T = (X, S: + S; + S,, Ss) 
is complete on w, but the method of Theorem 6.2 does not seem to help one show that the 
test based on (S, + S3)/(S: + Ss + S,) is u.m.p. similar. 
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of wioa = 0 against 2 — w:0% > 0 we try to prove the analogue of Theorem 
6.1. The region © is still given by (6.1), but w is now given by 
w= {0|—-27 <u < ow; O0O<A =u SSM HA H+A HKU K< @; 
MSA SM < 4. 
Clearly a sufficient statistic under w is T = (X, S., S;, U’), where 
U’ = (S, + Ss). 


Now 


bo f(T) = cld\2 + As — Ag) [ [LL sedan tas—rorncee) du’ de das dz 


where 


NZ# \ vo—-2 =vg-2 ¥atr5—2 
82 2 


g(t, X2 + As — a) = exp ” Sie Pig hp} 8? (u’) ? 


and 


Nué 82 83 
h(t,@) = ee a an os a ee 
(t, 6) op] We De da x \ 


The proof of Theorem 6.1 made use of the fact that 


{ _N# 6 (_ a) 
exp | NE | exp{ =} = -—) Dr; f 


for S; = S; + NX’. This method will not work here because the \, associated 
with the mean, viz. 4, = Az + As — Ay, does not equal any other A; . However, 
a lemma due to Gautschi® [17] and appearing in this issue of the 
Annals can be used to show completeness under this w and thus that the F-test 
of 03 = 0 is u.m.p. similar. 


7. Likelihood ratio test for the balanced one way classification. We shall show 
that for the balanced, one-way classification the likelihood ratio (L.R.) test is 
not the F-test, but for purposes of significance testing we can act as if it were. 
Let us consider J populations where the jth measurement on the ith population 
is given by 


(7.1) Xi; — ute? + ¢i;, 
(7.2) t= 1,2,---,2f; j2l,2,---,J; 


The usual Model II assumptions are made, namely, that uv is a constant and 
et , e;; are normally and independently distributed with zero means and vari- 


6 Dr. Gautschi independently derived the standard form which proved so useful in 
this work. ‘ 
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ances 04, 0. respectively. Let D be the usual orthogonal transformation that 
transform X,;, suitably ordered, to Z;; which have the standard distribution 


ar as 
ie enmer{ alk +e +)} 

where & = (Zn — VN»), Ss = Diins Zis ’ S; = jn - Zi and 
(7.4) M =e =o. + Jos 

(7.5) oa 

(7.6) PA | = dade ale, 

Clearly 

(7.7) Ae 2 As > O. 


To test Ho:02 = O(or Xx = As) it is well known that the usual F-test is equiv- 
alent to rejecting Hy if G > C where G = S,/S; and C is a constant which de- 
pends on the level of significance. 

The maximum likelihood (M.L.) estimates, fg, dx, \so, are values which 
maximize the likeiihood (7.3) subject to the condition that (7.7) is satisfied 
by the estimates, i.e. 


(7.8) hen 2 Aso. 


Equating to zero the derivatives of the likelihood with respect to uw, A», A; one 
gets as solutions 


(7.9) ia = zu/N 
(7.10) Neo = S2/I 
(7.11) Me = S;/[(J — 1)]. 


Since differentiation may give, as solutions, values which do not satisfy condi- 
tion (7.8), these estimates have primes to distinguish them from the ‘‘correct”’ 
M.L. estimates, which do satisfy (7.8) and are unprimed. That is, if 


ev av 
An = Azo, 


then (7.10) and (7.11) are the correct M.L. estimates. Because hee < a is 
equivalent to G < (J — 1)” it remains only to see what the estimates are 
when (7.10) and (7.11) do not give the ‘‘correct”? M.L. estimates, i.e., when 
G < (J — 1)". Since L, the logarithm of the likelihood may be written as a 
function of A» plus a function of A; it is clear that the values of \, that maxi- 
mize L, considered as a mathematical function defined for all positive \, and 
3, rather than as a likelihood (i.e. disregarding the restriction Xz; = A;), for 
fixed \; is the same , as is given by (7.10) and similarly for the value of A; 
that maximizes L for fixed \,. Also 0L/d\2. S 0 or dL/0A3; S O according as 
Ae 2 So/I or As 2 S;/{I(J — 1)]. This means that for any fixed d, , L decreases 
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as \s moves away from \jo in either direction and similarly for \: and \iq when 
dz is fixed. Now, by (7.8), the point (Aso , \so) in the Az , A; plane cannot lie above 
the line As; = A, . Suppose it were (strictly) below this line and (Ken , \se) were 
above the line, i.e. \so < Nao. If dso < Ago, then one can increase L by increas- 
ing \so ; if \so = Ago, L can be increased by decreasing hoo. In both of these 
cases the assumption that L is maximized at (hs , \se) is violated. Hence, when- 
ever hoo < Lee , the “correct”? maximum likelihood estimates are on the line, 
As = Ae, which is the w region. Thus maximum likelihood corrects negative 
estimates by making them zero. The maximum likelihood estimates are then 


= oe Zu 
(7.12) fhe = 


‘ ; So +S; 
A ng Ce 
(7.13) PY Bu U 
From (7.3) it can be seen that the square of the likelihood ratio is 


9 A rr, — 
Rr = ra exp {z’Ag’z — 2’Az’2}, 


where 2’ = {zn — SN4d, zn, 21,°°*, én ; 212°: 277}. The subscripts are 
ordered as in (5.2) and | oA | and | A, | are the maximum likelihood estimates 
of | A|. Since both z’Ag"z and z’Az'z can be shown to equal N, by a procedure 
given in the next section, R? = | Ag| / | Aw|. By (7.6), this becomes 


* i — 
(7.14) Ba SS 

ABw 
which is unity when G < (J — 1)" = Go, say. For G = Go, (7.9) to (7.13) 
imply that 


IJ I ol(J-)D 
. J SeS3 


(7.15) R= GID G+ Sy*’ 


whence 


J 
(7.16) Gi- ) : K = J’/{((J —1)’"] > 0. 


For values of R below one and values of G above (J — 1)”, the L.R. test and 
the G or F test will now be shown to be equivalent. Since low values of X are 
significant, to show the equivalence of the two tests for this range of G@ it is 
only necessary to show that R*" is a decreasing function of G or that 


947) d |a( I y] _14+G-—JG 
eer Gt \itG/ | @+@" 
is negative. Clearly, (7.17) is negative when 1 + G — JG < 0 which is equiva- 
lent toG > (J — 1)' = G. Also if G = G in (7.16), R = 1. We have al- 
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ready seen that R = 1 when G < G,. Now, let a = Pr{G > Gy}, which is 
the probability that R < 1. Hence 1 — a = Pr{R = 1}. Thus the atomic posi- 
tive probability mass at R = 1 means that there are no L.R. tests of «2 = 0, 
for the balanced, one-way classification, with level of significance greater than 
a» but less than 1. However, when an L.R. significance test does exist it is the 
F-test. Since F = I(J — 1)G/(I — 1), G@ > Go is equivalent to F > Fy , where 
Fy, = I/(I — 1). For all significance levels up to and including the 25 per cent 
level [9] the percentage points of F with (J — 1) and J(J — 1) degrees of free- 
dom for finite values of J and J greater than 1 are greater than Fy while the 50 
percentage points are less than F» for all these values of J and J. Inasmuch as 
it is unlikely that one wishes to use a significance level between 25 and 50 per 
cent, for all practical purposes, the F-test and L.R. test are equivalent in the 
case of the balanced-one-way classification. 

Although the F-tests of no population effect are the same under Models I 
and II, this quirk of the L.R. test does not exist in Model I. It is known that 
then the L.R. test is precisely the F-test. In Model I, the L.R. and the F-sta- 
tistic are strictly decreasing functions of one another and there is no positive 
probability mass at R = 1. 

It is of interest to note that there is a modified L.R. test which is equivalent 
to the F-test for the Model II, balanced, one-way classification. One can reason 
that if in (7.10) and (7.11) Asa < iso , then the estimate of o3 as given by (7.4) 
and (7.5) is negative. Then one way to modify or “correct” the estimates so 
that the estimate of 3 is zero, is to use as estimates (although they are no longer 
M.L.), 


; ‘ ‘ Ss 
(7.18 Awe = Asaee = = 
7.18) 20 30 Ig — 1) 


If these are put in (7.14), and if K is a positive constant 


r IJ 


which is a strictly decreasing function of G. Hence, if the estimates given by 
(7.18) are used when \2o > No, this modified L.R. test is equivalent to the 
F-test. Little can be said for this procedure, since the information in Sz is not 
used and when the estimate of o2 is negative one can argue almost as easily, 
by ignoring the information in S; , that the corrected estimates should be 


(7.20) ene = Lene = S/T. 


If these are used in (7.14), 


r lJ 


where K is again a positive constant. This is a strictly increasing function of G 
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and the modified test using it is certainly not equivalent to an F-test with large 
values significant. 

8. Likelihood ratio test for the balanced two-way classification. This section 
will be devoted to showing by means of a counter-example that when testing 
w'o3 = 0 in the two-way classification, not only is the L.R. test not the F-test, 
but (unlike the balanced one-way classification) is not even equivalent to it for 
small levels of significance. The L.R. test is a function of S: , S; and S,, while 
the F-test is a function of only S; and S,. From (5.5) the logarithm of the 
likelihood for @ ¢ Q, 2 given by (6.1), is 


N 1 . 8; 
(8.1) Ly = — og 2 — 5 {log |a| + 2 *} 


where | A | = AsA2"A3*AG‘AS’. The s; are defined by (5.6) and the d’s by (3.3). 
Recall that for all 6 <2 


(8.2) M1 = Ae + As > Ay 
and 
(8.3) M2 2M; MI 2M 2M; Ay 2 As > O. 


Rather than maximize L, subject to (8.2) we shall use a more general side 
condition, use of which will be made below, namely to maximize L» subject to 
df bA; = O and >} cA; = 0 by making use of Lagrange multipliers 8/2 and 
/2. Let 


8 5 
M = le + E> +2 ers. 
2 i=1 2 i=1 


Equating to zero the derivative of M with respect to 8, y, wu, A, (¢ = 1, 2, 
--» ,5) one obtains 


(8.4) i = Zin/VN 
(8.5) —v¥ + ~ + Bb: + yerds = 0, 
(8.6) 


where the carats indicate that these are the maximizing values. Adding the five 
equations in (8.5) and making use of (8.6) we obtain 


5 5 
Si >» =N, 
| h; t= 
and the exponent in (5.5) is -N/2 when the maximizing values of the param- 
eters are used. Thus the well-known result [18] when there is no condition on 
the )’s is also true if the \’s are linearly dependent. 
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Under Q, b; = by = l,o = db, = —1,b;5 = 0,¢, =0,( = 1,2,--- 5) and 
(8.4)-(8.6) become 


(8.7) fo = Zin JN 


hi —8 =0 


vedo — Soda + B = 0 
(8.8) st ia 

vsA\30 = S3A30 + B = 0 

vden <_ San — B= 0 
(8.9) heo = Ss V5. 


If the solutions of (8.2) and (8.8) satisfy (8.3), they are also M.L. estimates. 
If not then one would have to get the “correct”? M.L. estimates by some pro- 
cedure similar to the one used in the previous section. This will be unnecessary 
because we shall show that even when these solutions satisfy (8.3) the L.R. 
statistic is not a function of F alone. Hereafter we confine ourselves to the part 
of the z space where (8.3) is satisfied by the stationary values. Eliminating the 
Lagrange multiplier, and performing some simplifications one may write (8.2) 
and (8.8) as 


dio ay hea + Asa oT eo 


oe i. con 
Serve = ve + AsoA19 


(8.10) a Seetig 
Ss\sxa = vs + AsoAia 
Siva = i “f Mode 

Similarly, the logarithm of the likelihood under wios = Oord, = dz, = AG 


is given by (8.1) subject to 
b, = b, = 1, bo = bs = —1, bs = 0 
G = i, Gq = —i, Ci = C3 = Cs = 0. 
Using this last condition (8.4)—(8.6) can be simplified to 
fio = Zn/VN 
how = Ss/I 
hwo = (Se + Sa)/I(J — 1) 


hs» = Ss/IJ(K — 1). 


(8.11) 


As in the estimate under © we treated only the case when the stationary values 
satisfy Ay» 2 Aw 2 Asw > O. 
Since the exponent of the likelihood when the estimates under Q or w are 








MODEL II ANALYSIS OF VARIANCE 955 


inserted has been shown above to equal —N/2, the square of the likelihood 
ratio is 


(8.12) Re = |Aol _ 
| A. | 


where 
(8.13) Io = (hea + deo — daa) Aad N98 15K 
(8.14) Li, = (Ss/T)"[(S2 + S4)/Iv:)""* 


and \i., i = 2, 3, 4 satisfy (8.10). We have seen that the F-test of w:o, = 0 
is a function of S. and S, alone and does not depend on S;. It appears that 
R® may depend on S; since its denominator, Li, does. However La may equal 
S3 times a factor independent of S;, in which case R’ will be independent of 
S;. It was shown [3], by comparing the solutions, (8.12) for two examples 
differing only in values for s;, that R’ does depend on S;. 

In Section 10 it will be shown that both the L.R. and F-tests are invariant 
tests, but the F-test is to be preferred since it has an optimum property, namely, 
of being the u.m.p. similar test. 


9. Uniformly most powerful invariant test in the balanced, one-way classi- 
fication is the F-test. It will be shown that for the balanced, one-way classifica- 
tion, when the standard variable Z has distribution (7.3) the u.m.p. invariant 
test of wica = 0 against 2 — wie > 0 (or using (7.4) and (7.5) of w:@ = 1 
against 2 — w:@ > 1 where @ = )2/A;) is the F-test. We partition the Z vector 
as follows. Let Z’ = [Za, Zi) , Z(s)| where Za) = Zu, Z, is the column vector 
whose elements are Z, , 71 = 2, 3, --- , J and Zs) is the column vector whose 
elements are Z;; ;7 = 1, 2,---,I;7 = 2,3,---, J. The elements of Z.) and 
Z.3) may be ordered in any way. Clearly the problem remains invariant under 
the following groups of transformations, each of which is a normal subgroup 
of the product group of the previous ones: 


r7y* + * ,, o ¢ 
Z (1) = Za) + Cc, Zia) = Lia) : a= 2, a. 


(9.2) Sia Deakin ; D.q) orthogonal, a = 1,2,3 
Za = cZra) ; a= 1,2,3; ¢#0. 


A maximal invariant {8] under the product of all three groups is 
y 2 S 
Ges (Lins Zi) [Quins Li- Zis) te 3: 


It may be pointed out that unlike Model I the group of orthogonal trans- 
formations is unnecessary if we agree to base all decisions on the sufficient sta- 
tistic (Zy , S2, S3) of Section 7. Starting with this statistic the first and third 
group of transformations (additive and multiplicative group) will lead to G as a 
maximal invariant in the class of sufficient statistics. 











956 LEON H. HERBACH 


To show that the test which determines the critical region G > c (or the 
equivalent F-test which rejects » when W = »;G/v2 > c) is the u.m.p. invariant 
test one need only show it is the u.m.p. test based on G. Under w, W is distributed 
as F with v2 and »; degrees of freedom, while, under 2 — aw, it is distributed as 
6 times F with v: and »; degrees of freedom, i.e. the probability density of G is 


(9.4) p(g) = cbtg ® (6 +b 9) CF), 621, 


ae 


where 6 = )2/A;. By the Neyman-Pearson lemma the most powerful test o 
6 = 1 based on G against a particular alternative 6 = @ > 1 is given by ¢(q) 


ll 








1 when 

3/1 + 9\2 

2 cea 2 

4% ( : ‘) ws 
or 
l+qg 
9.5 ->c. 

( ) 4 + g ‘ 


The left member of (9.5) is an increasing function of g, since its derivative 
with respect to g is (8 — 1)/(@ + g)* which is positive. Hence this test is equiv. 
alent to y(g) = 1 when g > c. Since the value of c is determined by integrating 
the upper tail of (9.4) for @ = 1, it is not dependent on the particular alterna- 
tive. Thus for the balanced one-way classification one may replace the class of 
similar tests by the somewhat more reasonable class of invariant tests and show 
that in this more reasonable class the usual F-test of o2 = 0 against o; > 0 is 
also u.m.p. 


10. Invariance in the balanced two-way classification. It will now be shown 
why there may not be any uniformly most powerful invariant test in the case 
of the balanced two-way classification. We are interested in the test of w:02 = 0 
against 2 — wie, > O (or, if we let yi = Ao/Ay, of testing w:y, = 1 against 
Q — wiy, > 1) for the standard variate Z whose distribution is given by (5.5). 
The group of transformations analogous to those in the last section will be con- 
sidered. As in that section we partition the Z vector thus: 


2 = (Za, 2m, 2m, 2m, Zw), 


where Za) = Zi and Za) , for a = 2, 3, 4, 5, is the column vector of the Z’s 
(in any order) appearing in the sums S, of (5.6). The problem remains invariant 
under the same types of groups of transformations as in the preceding section, 


namely (9.1) fora = 2, 3, 4, 5 and (9.2), (9.3) fora = 1,--- , 5. A maximal 
invariant under the product group of the three groups, is U, V, W where 
(10.1) U = 8./S,, V=8;/S,, W = S,/8;. 


Any test based on U, V, W will have power based on the maximal invariant 
induced in the parameter space, namely, 


(10.2) v = (ri, ve, Ws), 








MODEL II ANALYSIS OF VARIANCE 


where 


(10.3) vi = Ao/My, v2 = As/Ma, vs = a/As. 


As in the balanced one-way classification (Section 9), the orthogonal group of 
transformations corresponding to (9.2) is unnecessary if we agree to base all 
decisions on the sufficient statistic (Zin , S:, Ss, Ss, Ss) of Section 5. 

By transforming the density of (S:, S;, S,, Ss) as given in (5.9) to that of 
(S;, U, V, W) and integrating out s; we obtain [3] 


r ve + V3 + % + V5 Vo—2 ¥g—2 vot 3+%4—2 
2 u*o* ww 3 
¥2 Yg Patratrs Vat¥stret's 
2 


Vive? vs . 6 


(10.4) py(u, v, w) = 


where 


. uw w vw 

(10.5) 6 = b(u, v, w; p) oe ee ae 1 

This shows that the density of u, v, w is indeed dependent on Xd only through 
the maximal invariant y = (1, ye, ws). The Neyman-Pearson lemma gives as 
the most powerful test of Ho:\ = »° against Hi:\ = \' (where A‘ = (A3, Aj, 
Ai, AS), A2/AL = vi, 7 = 0, 1 and yi = 1 < y}), based on (U, V, W) the one 
which rejects Hy when 


wiu+il) , w ratratratys 
mae ne 1 2 
pulu,v,w) _ | vs vo Se? 
pyo(u, v, w) uw w vw 
) 71st nH 7 1 


0 "2 0 ¥3 v3 Votrgtry 
e n= 2a ys}2 7° 2 
(*) (%) (¥) oe 


Since the distribution of V and W depend on ¥2 and ¥3 under Hp there seems 
to be little likelihood of obtaining a uniformly most powerful invariant test 
based on a statistic involving U, V and W from (10.6). It was not obvious 
from the fact that the maximal invariant was vector valued that no u.m.p. 
invariant test exists, since conceivably (10.6) might involve only one of the 
elements of the vector. For example if (10.6) were a function of U alone then 
once again the usual F-test would be uniformly most powerful. Although our 
probability ratio, (10.6) showed that there is no u.m.p. invariant test based on 
the given product group of transformations, there still may be one with respect 
to a larger group of transformations. For example, if in the last section we had 
stopped after the second group of transformations obtaining as a maximal in- 
variant S,, S; (rather than S:/S;) a situation analogous to (10.6) would have 
resulted. This may mean that another group of transformations, unknown to 
the author, may leave the problems invariant in the case of the balanced two- 
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way classification and the maximal invariant under the product of the four groups 
is U. 

Even if there are no further invariant transformations, an optimum test in 
this case can be obtained by decreasing the class of invariant tests. We have seen 
that a maximal invariant under G is the vector consisting of any three independ- 
ent ratios of S., Ss; , S;, S; and that G induced a group G under which a maximal 
invariant in the parameter space is the vector composed of the corresponding 
three ratios of dz, A; , As, As. But Yr = Ao/Ay (or its reciprocal) seems to be the 
only one that is independent of nuisance parameters under w. Also S;/S, (or its 
reciprocal) appears to be the only part of the maximal invariant under G whose 
distribution is a function of y; only. Thus, it seems reasonable to restrict our class 
to S2/S,. Then we obtain a u.m.p. test as in the last section. We now show 

TueoreM 10.1: Of all invariant tests of w: o2 = 0 against 2 — wie, > 0 
in the balanded two-way classification whose power is a function of Y only, the usual 
F-test is most powerful. 

Proor: If it can be shown that S./S, is the only invariant statistic whose power 
is a function of y only, the above assertion is true. However we have already 
shown a stronger result in Section 6, which includes this result, namely, the 
usual F-test is the u.m.p. similar test. Similarity in this example means 


Esg(X) = a, Oéew (ie. Yi = 1), 
while we want our test to satisfy 
E,¢(X) = const = a,say forypew (ie. = 1) 
f(v:) ve — w (ie. > 1) 
X = h(U, V, W). 


By y € w we mean that the components of y satisfy (6.2). There clearly is a simi- 
lar test for every invariant test which is a function of y only. Since the u.m.p. 
similar test is based on U, an invariant statistic, Theorem 10.1 is proved. Of 
course invariance added nothing in this case. 


11. Balanced multi-way classifications. The procedure of Section 5 can be 
used to obtain the standard form for any balanced multi-way classification. 
The evaluation of |  — Ag | just becomes a little more tedious as the number 
of factors increases. Of special interest is the case of the multi-fold, hierarchical 
or nested classification [6] model which is very useful in survey sampling theory 
[1]. The three-fold classification may be represented by 


r A AB ABC 
X sjtm = at @; + 6; +e Cijk + Cijkm 


AB ABC 
with » a constant, e7, e7;’, efi”, Cijkm » normally and independently distributed 
2 


with means zero and variances 03, o2) , ¢2b-¢, ¢2 and the range of subscripts as 

usual. In this special case the hypothesis that any variance component, except 
9 7 ~ : . 

oe, equals zero can be tested by an F-test and the method of Section 6 can be 
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used to show these tests are u.m.p. similar. Even in this special case the methods 
of Section 9 cannot be used to show u.m.p. invariance unless the multi-fold 
classification is one-fold, which is the same as the one-way case treated in Sec- 
tion 9. However, Gautschi’s [17] lemma must be used to prove that the usual 
F-tests are u.m.p. similar in the non-hierarchical multi-way classifications, when 
there are more than two classifications. 


12. Acknowledgements. The author is indebted to many people, especially 
to Drs. H. Scheffé, H. Raiffa, T. W. Anderson and Erich Lehmann for helpful 
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SOME REMARKS ON HERBACH’S PAPER, “OPTIMUM NATURE OF 
THE F-TEST FOR MODEL II IN THE BALANCED CASE”! 


By WERNER GAUTSCHI f 
Indiana University 


1. Summary. The purpose of this note is to present a lemma which will settle 
a question of completeness left open in Section 6 of the above mentioned paper 
[5]. We give two applications of the lemma, 

(i) by proving that, in addition to Herbach’s results, also the standard F-test 
for 0% = 0 is a uniformly most powerful similar test, 

(ii) by pointing out that the standard form introduced in [5] together with 
our lemma provide convenient tools to prove that in a balanced model IT design 
(with the usual normality assumptions) the standard estimates of variance com- 
ponents are minimum variance unbiased. This result is well known ((2], [3]) and 
it has in fact been pointed out by Graybill and Wortham [3] that a completeness 
argument may be used to demonstrate the minimum variance property of the 
usual estimators for the variance components. The present lemma shows that 
the estimators do indeed have the necessary completeness property. We will 
follow Herbach’s notation throughout. 


2. A completeness lemma. The following lemma guarantees completeness for a 
certain class of probability densities to which the results of Lehmann and 
Scheffé do not apply directly. It takes care of a difficulty mentioned in [5], Section 
6, which is caused when g(@) does not equal one of the 6; (7 = 2,---,r). If 
g(@) does, the product-densities could immediately be reduced to the exponential 
form considered by Lehmann and Scheffé in [7], Theorem 7.3. Our lemma is 
more general than the Lehmann and Scheffé Theorem 7.1 [7] in the sense that 
we allow instead of their g¢-(x”) to have go’ s(x”) which, however, we assume 
to factor into her (x” )hoe(x”) with he: (x”) > 0 and {he(x”) du} strongly com- 
plete. It is of course more special in that we take both yw” and yp” as Lebesgue 
measure and for go (2’), ge’ (x”) specific functions. Our proof is modelled 
along the same lines as the one given by Lehmann and Scheffé in [7] p. 221. 

Lemma: Let 


RB = {Pi ;0eD}, = (te, +++ ,tr),0 = (02, °-*, 6) 
B" = {Poo ; (i, 0) € D1 X D, 6; real 
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1 This is a cut-down version of a paper in which the author independently considered 
standard forms for model II designs. He acknowledges, however, the priority of Dr. Her- 
bach’s approach (see [4] as compared to [1]) and restricts himself to giving some results 
supplementing those of Herbach. 

t Werner Gautschi died on October 3, 1959. Editor. 


960 


REMARKS ON HERBACH’S PAPER 961 


be two families of probability measures on the Borel sets of the Euclidean space 
E,_, and the real line E, respectively, having the densities 


(1) po(t) = c(O)A(te, «>, test Mets 
(2) Po,o(t) = c(h, gertttert 


with respect to Lebesgue measure. If D, is the real line and D a Borel set in E,. 
containing a non-degenerate (r — 1)-dimensional interval then the family of product 
measures B = {Poi9 X Po; (0, 0) €D: X D} is strongly complete (in the sense 
of Lehmann and Scheffé [7}). 

Proor: Suppose ' 


(3) a | f(t, t)pe,.0(ts)pe(t) dt, dt = 0 (ae. L"%*)? 


Let N be the set of parameter points (6 , 0) for which J + 0. If N» denotes the 
6-section of N, ie. Ne = {6 ; (0:1, 0) € N}, then L"'(Ne) = 0 except possibly for 
6 ¢ No, where L°(No) = 0. 

According to Fubini’s theorem we may write 


[= / Po, .0(h)O(h , 0) dh, 


where ®(t, , 6) = f f(t, t)pe(t) dt. Since po, o(t:) > 0, for fixed 6 2 No the ex- 
ceptional set of points ¢, for which the integral defining #(t, , @) does not exist 
has L"'-measure zero. Furthermore, if @ ¢ No, we can, in virtue of (2), rewrite 
(3) as 


/ git [en O(h, | dt, = 0 (a.e. L"), 02 No. 


From the unicity property of the bilateral Laplace transform (see, for instance, 
[8], Ch. VI, Theorem 6b) it follows that 


&(t,, 6) = 0 (a.e. L"), 02No. 


Thus, if S denotes the (measurable) set of points (4, , 6) for which @ is either 
not defined or #0, almost every 6-section of S has L"'-measure zero, hence 
L"**(S) = 0. 


This in turn implies that almost all t,-sections of S have L’-measure zero, i.e. 
(4,0) = [fh,Oplt)dt=0 (ne. L’) if hes, 


where L‘'(Ni) = 0. Since the family of probability densities pp(t) is strongly 
complete (Lehmann and Scheffé [7], Theorem 7.3) we conclude 


2 L with a superscript denotes Lebesgue measure. The superscript indicates the space 
on which the measure is taken. 
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f(i,t) =0 (a.e. B'), 4 2N1, 
from which f(t; , t) = 0 (a.e. 8) follows immediately. 


3. Applications. (a) Tests of hypotheses in balanced model II designs. Consider the 
balanced two-way classification ([5], Section 6) and the hypothesis w:o2, = 0. 
The statistic 


Ti = Zum, T: = Sz ’ T; = 8S; ’ T, = S, + Ss 
is not only sufficient under w but also complete on w. In fact, if we let 


N 1 1 l 

A = a = a,’ = >,’ = a,” 
the densities of 7, and T = (7:, T;, Ts) are easily recognized to have the 
form given in our lemma. Proceeding therefore in the same fashion as in [5], 
Section 6, we would find that also the standard F-test of the hypothesis w:oa, = 0 
is a uniformly most powerful similar test. The same situation prevails in higher 
order classifications. As is well known, in a complete n-way classification F-tests 
exist for the non-existence of anyone of the (n — 1)st or (n — 2)nd order inter- 
actions. All these tests are uniformly most powerful similar tests. 

(b) Point estimation in balanced model II designs. To fix the ideas consider 
the standard form for the balanced two-way classification. A sufficient statistic 
for the parameters involved is 


(4) y = Zu > T> = So . a T; = Ss ° 
If we let 
4-=— VNw ( 6, = pies, beh 4 = me: 
Ao + As — Ag 2r2 2As 
the densities of T; and T = (7,,---, 7) are again of the form given in our 


lemma and thus the statistic (4) is complete on 2. Unbiased estimates for the 
° . . e 
variance components, in terms of (4), are 


2-2, e-t[B-™], g- 2 [B-2 
ied ve” s Aim &1 ? IK iw va)’ 


¢-hitum 
aT JK Va Vab . 


where », = J —1,%=J —1, v = (J —1)(J — 1), = IJ(K —1) and 

are therefore minimum variance unbiased estimates ([6], Theorem 5.1). On the 

other hand the standard estimates in terms of the various mean squares have 

the same distribution as those in (5) and must consequently be of minimum 

variance among all unbiased estimates based on the original observation vector X. 
Higher order layouts could be treated in a similar manner. 











REMARKS ON HERBACH’S PAPER 963 


REFERENCES 


{1] W. Gaurtscuti, ‘‘On an optimal property of variance-components estimates’’ (Abstract), 
Ann. Math. Stat. Vol. 28 (1957), p. 1058. 

{2} F. A. GraysiL, “On quadratic estimates of variance components’’, Ann. Math. Stat. 
Vol. 25 (1954), pp. 367-372. 

[3] F. A. GrayBrILt anp A. W. Wortuam, “‘A pote on uniformly best unbiased estimators 
for variance components’’, J. Amer. Stat. Assn., Vol. 51 (1956), pp. 266-268. 

[4] L. H. Hersacn, “Topics in analysis of variance: A. Optimum properties of tests 
for model II, B. Generalizations of model II’’ (Abstract), Ann. Math. Stat. 
Vol. 24 (1953), p. 137. 

[5] L. H. Hersacn, ‘Properties of model II—Type analysis of variance tests, A: Optimum 
nature of the F-test for model II in the Balanced Case’’, Ann. Math. Stat. Vol. 
30 (1959), pp. 939-959. 

[6] E. L. LeuMann Anp H. Scuerré, ‘Completeness, similar regions and unbiased estima- 
tion, Part I’’ Sankhya, Vol. 10 (1950), pp. 305-340. 

[7] E. L. LeaMann anv H. Scuerrsé, “Completeness, similar regions and unbiased estima- 
tion, Part II’’, Sankhya, Vol. 15 (1955), pp. 219-236. 

[8] D. V. WipperR, The Laplace Transform, Princeton University Press, 1941, 








THE MOST-ECONOMICAL CHARACTER OF SOME BECHHOFER AND 
SOBEL DECISION RULES! 


By Wm. Jackson Hatu 


University of North Carolina 


1. Introduction. R. E. Bechhofer [1] has considered a single-sample multiple- 
decision procedure for choosing, among a group of normal populations with 
common known variances, that population with the largest mean, and, with M. 
Sobel [2], a procedure for choosing the normal population with the smallest 
variance. Several other analogous problems have also been considered.? They 
suggest, with only intuitive justification, choosing the population with the 
largest (smallest) sample mean (variance), and give tables for finding the 
minimum sample size (assumed equal for all populations) which will guarantee 
a correct decision with prescribed probability when the extreme population 
parameter is sufficiently distinct from the others. This paper gives justification 
for a wide class of such procedures, proving that no other rules can meet this 
guarantee with a smaller (fixed) sample size; that is, such rules are most eco- 
nomical [4]. 

Proof of the most-economical character of these rules is achieved by proving 
their minimax character when a suitable loss function is introduced. R. R. 
Bahadur and L. A. Goodman [5] have considered a class of multiple-decision 
rules which they have called impartial (invariant under permutations of the 
populations). Their results are applicable to such problems of choosing the best 
population and imply that Bechhofer and Sobel’s rules are minimax rules (in 
fact, uniformly minimum risk rules) among the class of impartial decision rules. 
The present paper removes this restriction of impartiality. Thus, in the present 
context, impartiality is no restriction when looking for minimax rules, as is 
well-known to be the case for certain other kinds of invariance. 

The main result is stated in Section 2 and proved in Section 3. It is applicable 
to any analogous problem of choosing the population with the most extreme 
parameter when, for each sample, there is a numerical! sufficient statistic with a 
monotone likelihood ratio* and the (numerical) parameter is a locution or scale (but 
not range) parameter’ in the distribution of the statistic. The theorem is applicable 
to Bechhofer’s procedure and the corollary to Bechhofer and Sobel’s. (In the 
latter example, if the means are unknown, it will be necessary to invoke invariance 
under changes in scale.) In Section 4, the result is further extended to problems 


Received November 23, 1958; revised March 26, 1959. 

1 This research wassupported by the United States Air Force Office of Scientific Research 
of the Air Research and Development Command, under Contract No. AF 49(638)-261. 
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? For a review, see the introduction in [3]. 

* For definition, see [6], for example. 
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of ranking the populations according to the parameter values, or of grouping 
them by ranks, as formulated by Bechhofer [1]. 

The requirement that the parameter be one of location or scale is dropped in 
Section 5. Then the guarantee holds only at a specified location; for many prob- 
lems, a least favorable location can be determined so that the guarantee can be 
made to hold irrespective of location. For example, the procedures of M. Sobel 
and M. J. Huyett [7] for choosing the largest of several binomial parameters are 
proved to be most economical. In Section 6, the broader optimality of these 
latter procedures is discussed. 

These results, some of which appeared in [8], are obtained from application 
of most economical decision theory [4]. As indicated by Bechhofer [1], if the popu- 
lations differ in a known way (normal populations with different known variances, 
for example), optimal allocation of the sample sizes is apparently exceedingly 
complex; such problems are not treated here. 


2. Theorem. 

(i) Let {fs}, 0 ¢ QC R,, be a homogeneous class of density functions’ w.r.t. 
a fixed measure. Let {X;;} (¢ = 1, +--+ ,m;j = 1, --- , n) denote mn independent 
random variables where X;; has the density function fo, , 0; € 2, i = 1,--+, m, 
and let 0; S «++ S tm) be the ordered values of the 0,78. Set @ = (0, +--+ , Om). 

(ii) Suppose t; = t(ta,---, Zin) 18 @ numerical sufficient statistic for 
(Xa,---, Xin), that t; has a monotone likelihood ratio, and that 6; is a location 
parameter in the induced distribution of t; (¢ = 1, +++, m). 

(iii) Let D, denote any decision rule for choosing which 0; is 0[m, based on an 
observation on the mn random variables {X;;\, and let D‘, denote that D, which 
chooses as 0m) that 0; corresponding to the largest of the t,’s with ties broken by ran- 
domization. Suppose N is the least n for which 


[ree + 6 — 0) dF,(t) + ora 


(1) 


- [Walt + 8) — Pale + 8 — OUR A(t + 8 — ODI" dP a(t) = 7 
(6>0,0<y7< 1) 


where Fe,,(t) = Fn(t — @) is the c.df. of t with parameters 0 and n. 

Then Dy, satisfies 

(a) Pr{correct decision using D, | ®} = y for all ® for which 0,m; — %m—1) 2 5 and 

(b) there does not exist a decision rule D,, satisfying (a) withn < N. 

Coro.uary: Replace in (i) “R,” by “positive Ry’; replace in (ii) “location” 
by ‘‘scale’’; replace in (iii) “t + 8” by “t6”, “6 > 0” by “6 > 1”, “F(t — 6)” by 
“F,(t/0)”; replace in (a) “Aimy — Om”? by “Ojm)/O{m—1)”. 

Note: The summation term in (1) accommodates the possibility of ties—when 
r = 2,---, m t-values may be largest—and drops out if F%,,, is absolutely con- 


4 The region of positive density is independent of @. 
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tinuous; hereafter, for simplicity of presentation, we make this assumption and 
thereby replace (1) by 


(1') [ Fret 0) aF, = +. 


3. Proof of theorem. Set w; = {6 | 0; = Otmj , Amy — O(m—1) 2 5}, and p,(6) = 
Pr {choosing 6; using D,, | 6},7 = 1, --- , m. Then (a) is equivalent to: p,(@) = v 
for @ € w; (¢ = 1, ---,m). 

Let A; be a distribution over w; which assigns probability one to the 6-point 
with all coordinates equal to % (arbitrary) except the ith coordinate which 
equals @ + 4; i.€., 91; = Om = % = 6; — 6. Denote this point 6; . 

We first show that, for n fixed, D°, is minimax for choosing among 6,, --- , 
where the loss function is —1/y if a correct decision is made and zero otherwise, 
and that, when using D®,, p;(@;) = {F.°(t + 4) dF,(t) for all 7. Secondly, we 
show that the d,’s are least favorable in the sense that inf, ,p:(@) = p;(6;) = 
J..p(®) dd; , as shown in the special case of Bechhofer in [1]. Application of 
Theorems 7 and 9 from [4] completes the proof of the theorem. The corollary 
may be proved by applying a log transform to ¢, 6, and 6. 

1.° According to well-known results of Wald (e.g., see Section 1.B of [4]), a 
minimax rule for choosing among the 6,’s with the specified loss is one which 
chooses 6; as the largest 6 if a;h; 2 ajh; for all j where h(t, 6) is the joint density 
of t, , --- , tm when the parameter is 6, h; = h(t, 6;), and a, --- , @m are positive 
constants chosen so that p,(0;) = --- = pm(®,). Denoting the density of F 
by g and of Fs, by ge (dropping the subscript n assumed fixed), h(t, 6) = 
Go,(ti)go,(te) --- go,,(tm) so that ah; 2 ah; implies 


(2) Q9oo+8(tidgeo(ts) = ajGe,(ts)Goo+s(t;), 


or equivalently, since @ is a location parameter, the subscripts on the g’s can be 
subtracted from the arguments. Denoting r(t) = ge,+s(t)/ge,(t) for fixed % 
and 6, defined throughout the region of positive density for ¢, (2) implies r(t;) < 
r(t;)a,;/a;. Since t has a monotone likelihood ratio, r(t) increases with ¢, the 
inverse function exists, and (2) may be written ¢; < r‘[r(t;)a;/a,]. Therefore, 
the probability that the minimax rule chooses 6; as largest when 6 = 9, is 


p;(0;) = Pr {a;h; = a; h; for all j\® = 6;} 


& Pr {a;h; 2 ajh; forall j|t; = y, 6 = 0,} 


(3) - 
= [TD Fanle(yai/a)) dFossa(y). 
It 
This is independent of 7 if a, = a2 = --- = a», in which case 6; is chosen if 


h; is largest. Because of the monotone property of h; , the minimax rule is thus 
D‘. . Upon setting the a,’s equal and transforming t = y — 6 — 6, (3) becomes 
p:(®;) = [Fx'(t + 6) dF,(t), and is thus independent of the choice of 6 . 

2.° Similarly to (3) above, for D®. we have p(®) = Pri{t; = t; for all 7| 6} = 
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JU jzil’o;.n(y) AF o,n(y) = SUieia(u + 0; — 0;) dF,(u) which increases with 
6; — 6; for each j # i. For @ € w; , 06; — 0; = 6 so that the infimum over «a; of p; 
is attained at 6; — 6; = 6 for j # 7 and, in particular, at @ = 6;. 


4. Extension to procedures for grouping by ranks. The results in Section 2 can 
be extended to the problem of ranking the populations according to their 6- 
values, or more generally, of selecting the m, “best” populations, the m,_, ‘second 
best’, etc., the m, ‘‘worst”’ populations, given m, , --- ,m,(s S m, dim = m)— 
the “general goal” expressed by Bechhofer in Section 3.B of [1]. The rule D}, is 
to rank according to ¢-values, choose N by a rule analogous to (1) or (1’) (see 
[1]), and then the probability of a correct grouping will be at least y when the 
groups are sufficiently far apart. Most economical theory is used for discriminat- 
ing among the m!/(m,! m.!---m,!) possible alternative decisions. The 
proof differs little except for the notational complexities. 


5. Extension to other ordering parameters. Intuitively, a procedure which 
ranks the 6’s according to the values of the sufficient statistic ¢ should be optimal 
whenever @ is some kind of ordering parameter in the distribution of ¢. That ¢ 
should have a monotone likelihood ratio is such an ordering requirement. A less 
stringent requirement is that the c.d.f. of ¢ be monotone in @ for all t; that is, 
denoting by 7; a random variable with distribution parameter 6;(6, < 62), 
Pr{T, > t} 2 Pr{T, > @ for all ¢, in which case 7; is said to be stochastically 
larger than T, . E. L. Lehmann has shown (Theorem 1 in [9]) that a monotone 
likelihood ratio assumption implies the latter type of ordering. That @ be a loca- 
tion parameter is an additional ordering requirement—that F, be a particular 
kind of monotone function, namely F(t — @). It was required in the theorem so 
that the probability in (a) could be computed on the condition that the best 
population was sufficiently distant from the second best without regard to the 
location of the best population; that F's be monotone was also required, but this 
follows from the monotone likelihood ratio assumption. Thus, the location 
requirement in (ii) can be removed by adding it in (a), so that replacing F(t + 6) 
by F¢,-s(t) and dF by dF,, in (1) and (1’) and replacing (a) by (a’) in which 
the inequality is required to hold for all 6 for which @,m; = 00 and Om) — Om—1) 2 4 
for some specified value 4 of the parameter, the theorem and proof remain valid. 
The guarantee of a correct decision is only calculated at one location, specified 
by 4%. 

In many such problems, it will be possible to find a least favorable location 
% , i.e., a value of @ which minimizes [F¢24(t) dF s(t), in which case (a) need 
not be replaced by (a’). Sufficient conditions are that 2 be bounded and closed. 
If not, it may be possible to find a least favorable sequence, applying Theorem 
8 of [4]. 

In this revised form, the theorem applies to all such problems of choosing 
the best population whenever there is a numerical sufficient statistic with a 
monotone likelihood ratio, and therefore, in particular, if its distribution is in 
the exponential family. Thus, it holds for Sobel and Huyett’s procedures [7], 
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the condition (a) corresponding to their ‘original specification” using a least 
favorable 6 , and (a’) to their “alternative specification.” 


6. A distribution-free extension. It can be shown that Sobel and Huyett’s 
procedure is optimal not only for choosing the best binomial population but for 
the more general problem they describe of choosing the population with the 
largest “survival probability’, with no parametric specification of the under- 
lying distributions. If the distributions differ only in location, then the problem 
is equivalent to that of choosing the population with the largest median. This 
application is adapted from an example by W. Hoeffding [10}. 

Let the class of density functions under consideration include all densities, 
f, w.r.t. a fixed measure y» on the real line such that 0 < u({x = a}) < 1 for 
some specified a. The {X;,;} are assumed independent with 6; = Pr{X,; = a}, 
constant over j = 1,---,n(¢ = 1, ---, m). If X;; represents a lifetime, then 
6; is the probability of survival to age a, and none of the distributions need 
coincide except in their 6-values. 

Extension of the theorem can be accomplished as indicated briefly here: Subsets 
{w;} of density functions are specified in terms of the 6’s as in Section 3; a priori 
distributions over these sets are specified, somewhat as in example 5 in [4], which 
reduces the problem to that of choosing the best of m binomial distributions. 
The decision procedure is to choose 6, as the largest of the 6,’s if more of the 
2;’8 exceed a than do the z,;;’s for any other 7. That these a priori distributions 
are least favorable follows as in Section 3, using Theorem 7 from [4], and noting 
that the probability of a correct decision depends only on the 6-values. A least 


favorable 6 can be chosen, if desired, as in [7]. Thus, the procedure of Sobel and 
Huyett is most economical for this distribution-free problem in the sense that 
(a), or (a’), and (b) are satisfied. 
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THE ADMISSIBILITY OF PITMAN’S ESTIMATOR OF A SINGLE 
LOCATION PARAMETER! 


By CHARLES STEIN 
Stanford University and University of California, Berkeley’ 


1. Introduction. Pitman [1] gave a thorough discussion of the problem of 
estimating the location and scale parameters of a distribution which is known 
except for one or both of these parameters. In particular, if X, --- X, are real 
random variables independently and identically distributed according to the 
density r(x — £) (with respect to Lebesgue measure), where ~ is unknown but 
the function r is known, Pitman shows that the estimator 


[eTrx-ode 
: [Trix -pa 


is the best translation-invariant estimator in the sense that it minimizes 


(1.1) o(X1,--° ude) 


&{E(X, --- X,) — &)? among all estimators £ for which 
(1.2) E(ar + ¢,--+, an +e) = &a,--+, 2) te 
for all x; , --- , 2, and c. Girshick and Savage [2] showed that £ is minimax in 


the class of all estimators (not restricted by (1.2) ) and this also follows from the 
later more general results of Kudo [3] and Kiefer [4]. Karlin [5] has shown that 
under certain conditions £ is admissible, that is, if — is any estimator for which 
( 1.3) E(&(X, “aes Me) bet ¢)? < E:( Eo X; eae wal ~ £)° 

for all —, then equality holds for all £. Since his conditions are fairly strong, and 
his method somewhat special, it seems desirable to present an alternative proof. 
Theorem 1 of Section 2, when reformulated for the present slightly special case, 
becomes 

TuHeEorem. If 


| II (2) [fer IT r(x: ae 
| [IL rca — 8) ae 


ios [eT r(x; — £) dé r 


then & defined by (1.1) is admissible. 


dx; < © 


Received August 25, 1958. 
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The condition (1.4) is not very strong. For example, if there is any translation- 
invariant estimator £ for which &| & — &|° < «, then (1.4) holds. For the 
Cauchy distribution r(x) = 1/2 (1 + 2”) withn 2 7, thisis true with £ equal to 
the sample median. 

The proof is given by a method first used by Blyth [6], and the result seems to 
be the best possible obtainable by this method. Here, as in Lehmann and Stein 
[7], roughly speaking, the theorem requires one more moment than is clearly 
relevant. In [7] a first moment is required, although it is a testing problem, and 
here, a third moment rather than a second. It would be interesting to know 
whether conditions of this type are necessary. Essentially the same method will 
be applied in a paper, now being prepared, to the problem of estimating two un- 
known location parameters with quadratic loss. There it is necessary to vary the 
form, as well as the scale, of the a priori distribution (see the argument around 
(2.16) ). The bivariate normal case has already been treated by the author in 
[8]. For three or more translation parameters with positive definite quadratic 
loss, Pitman’s estimator is not admissible. This was proved in the normal case 
in [8]. While it is of some theoretical interest to prove the admissibility of the 
natural estimator when it is admissible, the careful study of other estimators when 
the natural estimator is not admissible has greater practical value. 

It may be useful to indicate the correspondence between the notation used in 
this introduction and that of the slightly more general problem treated in the 
remainder of the paper. Let Yy be the n — 1 dimensional real coordinate space, 


€ the o-algebra of all Borel subsets of Y and » the distribution of Y defined by 
(1.9). 


| ar(x)r(a + ys) +++ r(x + yo) ax 
(1.5) O_o 


r(x)r(x + y:) +++ (a + Yar) dz 


and 


(1.6) gly) = / r(x)r(a + ys) +++ r(a + Yar) az, 


where 


(1.7) } = (fi, *** » Yn). 
Also, let : 
at oie at r(x + fly) )r(a + fly) — wm) «++ re + f(y) = Ya) | 
g(y) 
Then conditions (2.1) and (2.2) are satisfied by p, and (1.4) reduces to (2.3). 
If we define the random point (X, Y) by 
Y, = X.— Xi, 
(1.9) : 
Yu = Xn _ X, ’ 


(1.10) Xo RK -A¥s, +e), 
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then the estimate X, proved admissible in Section 2, is seen to reduce to 
fo(X1,--+, Xn). 


2. The results. Let @ be the c-algebra of all Borel subsets of the real line 9x, 
and @ a o-algebra of subsets of a set Y. Let » be Lebesgue measure on @ and » 
a probability measure on @. Let p be a nonnegative valued ®C measurable func- 
tion on X X Y such that 


(2.1) [ p(x y) ar = 1 
forall y 
(2.2) | =v, y) dx =0 
3/2 
(2.3) [ow ( x'p(z, y) az) <o, 


where we write dx instead of du(x). Then of course p is a probability density with 
respect to uv. We shall prove 

THEOREM 1. Under the above hypotheses, if we observe (X, Y ) distributed so that, 
for some unknown §, (X — &, Y) has probability density p with respect to uv, then 
X is an admissible estimator of § with squared error as loss. 

In other words, if ¢ is any @®C measurable function on X X Y such that 


[ ay) [let y) — p(x — & y) dx 
(2.4) 


< [ aw / (x — §)*p(z — &. y) dx = [ow | 2p0, y) dx 


for all &, then the two sides of (2.4) are identically equal. Actually we prove the 
trivially stronger result that ¢(z, y) = x almost everywhere (uv). One might 
hope to prove this result under the condition 


(2.3’) [ aw [ 2p(z,w dz < @, 


which is weaker than (2.3). Of course (2.3’) is necessary, for otherwise we could 
take g(x, y) = 0. 

We shall derive Theorem 1 from a slightly more general but weaker theorem. 
With X, Y, 8, C, u, v as before, let P be a nonnegative valued ®C measurable 
function on X X ‘ such that, for each y, P(-, y) is a cumulative distribution 
function and 


(2.5) / xd, P(z,y) = 0 


(2.6) [ ay) (f x’ d, P(z,v)) <a@ 
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THEOREM 2. Under the above hypotheses, if we observe (X, Y ) distributed so that, 
for some unknown &, Y is distributed according to v and the conditional cumulative 
distribution function of X — & given Y is P(-, Y), then X is an almost admissible 
estimator of with squared error as loss. That is, if ¢ is any @®C measurable function 
on X X Y such that 


[ow / le(z, y) — & d, P(x — &, y) 
(2.7) 
< [ aw) / (x — §)? d, P(x —&,y) = [ow [ 2a.P(z,y) 


for all &, then the two sides are equal for almost all &. 


By a familiar argument, we observe that Theorem 1 follows from Theorem 2, 
if we put 


(28) rE 5y''2 [ p(t, y) at. 


If ¢ satisfies the hypotheses of Theorem 1, it also satisfies those of Theorem 2 


and we conclude that in (2.4) equality holds for almost all —. Now suppose that 
contrary to the conclusion of Theorem i, 

(2.9) ur(S) > 0, 

where 

(2.10) S = {(a, y):e(2, y) # a}. 


Then, for all — in a set T of positive measure, 

(2.11) [ow | ve@-sy) ae > 0, 
8y 

where S, = {x:¢(2, y) # 2}, since 


[af aw f p(x — §,y) dx 
(2.12) " 


= [ ow) [ax | — sya = | aly) [, as = w(S). 


Let 
(2.13) ¢go(x, y) = 4(2 + ¢(z, y)). 
Then 
(2.14) [eo(x, y) — ef S Hle(a, y) — ef + (2 - 8) 


with strict inequality whenever ¢(x, y) * zx. It follows that we have strict in- 
equality in (2.4) and thus in (2.7) for all &¢ 7 contradicting the conclusion 
of Theorem 2. An example given by Blackwell [9] with Y reducing to a point 
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and P concentrated on a finite set shows that in Theorem 2 we cannot conclude 
admissibility. 

To prove Theorem 2 we suppose the conclusion does not hold, that is, we sup- 
pose (2.7) holds with strict inequality for — in a set S having positive Lebesgue 
measure. For e > 0, let S. be the set of £ for which 


(215) f aly) f lolx, v) — gf aP(2 — &y) sf doly) f 2 aP(a,y) - « 


Since S = US,, S, will have positive Lebesgue measure for sufficiently small e, 
and we suppose ¢ chosen so that u(S,.) > 0. Since S, (like any measurable set) 
is of density 1 at almost all points of itself (see for example Titchmarsh [10], 
p. 371), there exists x > 0 and an interval J = (a — x, a + x) such that the 
set of § ¢ J for which (2.15) holds has Lebesgue measure 2 «x. There is no real 
loss of generality in assuming J = (—x, x). Now we assign to an a priori density 
(1/o)q(é/o), taking for simplicity of computation 


soni mm 1 
(2.16) q(t) = a+)" 


From (2.7), and the fact that (2.15) holds for a set of measure = «x in (—x, x), 
it follows that 


(2.17) sle(X,¥) — as | do(y) [ 2 aP(2,y) - & 
a7O 
for sufficiently large o, where ~ has the indicated a priori distribution, and the 


conditional distribution of (X, Y) given é is that indicated before Theorem 2. 
However, we shall show that under the same distribution 


(218) inf E(X,¥) — d= f dvly) [ PaP(z,y) -22, 
¥ og 
where 


(2.19) lim f(c) = 0. 


For sufficiently large o this contradicts (2.17). 
We shall find the formula 


[ ao(u) [ 2 aP(x, y) - inf Ely(X, ¥) — ef 


a ite * | doy) | dz L/ nq (: _ ") dP(n, » | 








fo(Z=*) arcu) 
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useful in proving (2.18). To prove (2.20) we first observe that 


inf E[y(X, Y) — &) = inf FE{(y(X, Y) — e?| X,Y} 
(2.21) ° ° 

= E{Elé\ (X, Y)] -— 8°, 
so that 


[ av(y) [ 2 aP(e, y) — int BIY(X, Y) - 


= E(X — &)* — E{ Ele | (X, Y)] — €}° 
= E{X° — 2X + (Ele | (X, Y)])*} = E{X — Ele | (X, Y)|" 


a(®) az | doy) | 


\2 
dy P(x — #, y)| 
eee leeeenagmeneseneemncnenoen —~) ad, P(z — £, y) 
dy P(x — #',y) 


f (2=*) dP (n, | 


q fo(=? re dP(n, y 
o 


d, P(x — &, y) 


: [a bce 
- dv(y) / a(‘) az | ——A~__% -— ——-- dP(n’, y) 
og 

fal t4— scan y) 


=* fay) f aq.) [|= 3 ed ae 


“2 


J ma (22) ava »| 


Tete, ») | 
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Comparing this result with (2.17) and (2.18), we see that, in order to complete 


the proof of Theorem 2, we need only show that 


za 
(2.23) lim | dry) / eee dx = 0. 
= a(? — ) aPC») 


In order to prove (2.23) we consider the integral 


&(P,«) = i aE) apc) | 








/ q (Z—) dP(n) 


(2.25) V(A, ¢) = sup ®(P, oc), 


Peuy 


(2.24) 


and the function 


where U, is the set of probability measures P for which 


(2.26) [ naP() = 0 


(2.27) [y dP(n) = ». 


As indicated earlier, we take 


1 
(2.28 iad ecias 
) q(é) i+) 
but the basic formulas hold for an arbitrary g. We first observe that 
3, (A 
(2.29) YA,c) = év(%,1), 
since 








(2.30) 


[a ts < hen) 


2 


[ ra — n) aP (ye) 
= ee : 
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We observe also that 

(2.31) ¥(A, 1) SA, 

a bound which will be useful only for large \. This follows from the convexity 
of &, or from 


(2.32) @(P,1) = [ 2 aP(2) - inf @ly(X) — eh, 


with ¢ distributed according to g, and X — & given é according to P, which is 
essentially (2.22). 
For \ s 3, 


[ ae —n) dP(y) = P{—1,1) inf g(x —2z) 
(2.33) ze[—1,1] 


9 
) 


i ‘ 
“5e(1 +22) 10"(1 + 2) 


by Chebyshev’s inequality. Also 


lf nq(x — n) ar(a) | = tf niq(x — n) — q(2)] aP(a)} 


s [i dP(») / [g(a — n) — g(x)]’ dP(n) 


by Schwarz’s inequality. Thus 


| ra — n) aP(x) | 


o(P, 1) = [ a E —— 
q(x — ) dP(n) 


10 2) (f Par fies} -; tale 
25 [aa + 2x) ( n di (n)) F sac w ite dP(n) 
= . (f n ar(a)) f apc J | : be ate [atea. 

3 , tl+(z—s9)? 142 


1 1 ; 2 ” 
flaetigc ital ot+oe 


Pee bors of 
(x — »)*}? 1 + (z — »)* 


dx 
(1 + 2°)?’ 
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so that 

(2.37) #(P,1) < q [°aP(a) | 
1.€., 

(2.38) Y(A, 1) S co for A <3 


Combining (2.29), (2.31), and (2.38), we have finally 


(SN fork Sho 
(2.39) W(Ao) < 4 


2 


ioAfork 2 3a. 


Now let »* be the distribution of fq’ dP(q, Y), i.e., 
240) 018) mv du: fap a) 
(2.40) v*(S) =v dy: n aP(n, y) eS? . 

\ ) 


Then 


coat) fav ofall *) aca | ai 





- dN” dv*(\) + “| Adv*(X). 


ne o 0 
(==) dP(n, y) 
o 


For any « between 0 and 1, choose a» so large that 
(2.42) aa *dv*(X) <e. 
u 


Then, for « = a 


je? ois 
s N* dv*(X) if n* dv*(d) #2 4,2 d* dy*(n) 
g #0 


oael €e€2 ) 
2.4: < he 8/2 dy* % [. *? d* 
(2.43) ZV 3), r ay’ oe dv*(X) 
le r 3/2 € 
= rn” dv*(A —. 
~ v*( +7 


(2.44) an d dv*( svi], r*? dv*(A) S V2. 


iA 


Thus the right-hand side of (2.41) approaches 0 as ¢ — ~, which completes 
the proof of (2.23), and thus the proofs of Theorems 2 and 1. 
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THE USE OF SAMPLE QUASI-RANGES IN ESTIMATING 
POPULATION STANDARD DEVIATION 


H. Leon HArtTER 
Wright Air Development Center 


Summary. The use of sample quasi-ranges in estimating the standard deviation 
of normal, rectangular, and exponential populations is discussed. For the normal 
population, the expected value and the variance of the rth quasi-range for samples 
of size n are tabulated for r = 0 (1) 8 and n = (2r + 2) (1) 100. The efficiency 
of the unbiased estimate of population standard deviation based on one sample 
quasi-range is tabulated for the same values of r, with n = (2r + 2) (2) 50 (5) 100. 
Estimates based on a linear combination of two quasi-ranges are considered 
and a method is given for determining the weighting factor which maximizes the 
efficiency. The most efficient unbiased estimates based on one quasi-range for 
n = 2(1) 100 and on linear combinations of two adjacent quasi-ranges and of 
two quasi-ranges among those with r < r’ S 8 for n = 4 (1) 100 are tabulated, 
along with their efficiencies. An example illustrates the use of these estimates. 
For the rectangular population, the efficient estimate of population standard 
deviation, which is based on the sample range, is tabulated for n = 2 (1) 100. 
The bias, when estimates which assume normality are used, is tabulated for 
n = 2 (1) 100 for rectangular and exponential populations. 


0. Introduction. It is well known that, for small samples, the standard deviation 
of a normal population can be estimated quite efficiently from the sample range. 
However, the efficiency of the estimate based on the range decreases rather 
rapidly as the sample size increases, being less than 35% for samples of 100. 
There appears to be a need for substitute estimates which are reasonably efficient 
for moderate sample sizes, yet much simpler to compute than the efficient esti- 
mate based on the sample standard deviation. A number of authors, including 
Jones [9], Nair [12], [13], Godwin [6] and Sarhan and Greenberg [17] have 
proposed methods based on order statistics. This paper will be concerned with 
estimates, based on sample quasi-ranges, that satisfy these requirements quite 
well. Up to the present such estimates have been used relatively little, mainly 
because of the lack of suitable tables, though estimates based on quasi-ranges 
of samples of moderate size were proposed by Mosteller [11] in 1946, and esti- 
mates based on quantiles of large samples had been advocated much earlier by a 
number of authors, notably Edgeworth [5], Sheppard [18], and K. Pearson [15]. 
More recently, Benson [1] has explored further aspects of estimates of the latter 
type. 

The rth quasi-range, w, , of a sample of size n is defined as the range of (n — 2r) 
sample values, omitting the r largest and the r smallest. Symbolically, 
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W, = In-r — Ir41, Where 2 S te S ++: S 2, are the ordered sample values. 
Cadwell [2] has shown that the range, w , is the most efficient statistic of this 
type for estimating the standard deviation of a normal population from samples 
of sizes up through n = 17, beyond which point w, is optimum up through 
n = 31, where w, becomes better. Cadwell [2] has also proposed the use of linear 
combinations of quasi-ranges, while Dixon [4] has advocated the use of sums of 
from two to four quasi-ranges with equal weights. 

Section 1 of this paper will deal with the most efficient unbiased estimates of 
the standard deviation of a normal population, based on one sample quasi- 
range and on linear combinations of two sample quasi-ranges. Section 2 will 
be concerned with the efficient estimate of the standard deviation of a rectangular 
distribution, based on the range, and with the bias of the estimates which assume 
normality when the population is actually rectangular or exponential. 


1. Estimates of o for a normal population. 

1.1. Expected values of quasi-ranges. In order to determine the factor by which 
the rth quasi-range, w, , must be multiplied in order to obtain an unbiased esti- 
mate of the population standard deviation a, it is necessary to know the expected 
value E(w,) of the rth quasi-range for samples of n from a standard normal 
population, which is given by Cadwell ([2], p. 606) in terms of an integral 
which cannot be evaluated in closed form. Expected values of the range (to 
five decimal places) have been tabulated for n = 2 (1) 1000 by Tippett [20]. 
Cadwell [2] has tabulated E(w,) to four decimal places for n = 10 (1) 30. The 
author has computed tables of Z(w,), accurate to within a unit in the sixth 
decimal place, for r = 0(1) 8 and n = (2r + 2) (1) 100, using the Burroughs 
E101 computer. The trapezoidal rule was employed for the numerical integration. 
The results, which are given in Table 1, agree with those obtained by Tippett 
and Cadwell to within a unit in the last place published by them. The values in 
Table 1 also agree with those found by doubling the expected values of order 
statistics, which have been tabulated to ten decimal places for n = 2 (1) 20 by 
Teichroew [19], and rounding to six decimal places. 

1.2. Variances of quasi-ranges. In order to determine the variances of un- 
biased estimates based on quasi-ranges (and hence their efficiencies), it is neces- 
sary to know the variance of the rth quasi-range for samples of n from a stand- 
ard normal population. This is given by the equation var w, = E(w:) — [E(w,)]’, 
where E(w) can be obtained by multiplying the probability density function 
of w, (see Cadwell [2], p. 604) by w% and integrating with respect to w, between 
the limits 0 and «. Tippett [20] and E. S. Pearson [14] have computed approxi- 
mate values of the variance of the range for a few values of n. Cadwell [2] has 
tabulated var w, to four decimal places for n = 10 (1) 30. The variance of all 
quasi-ranges for samples of n = 2 (1) 20 can be obtained quite easily from ten- 
decimal-place values of the variances and covariances of order statistics, which 
have been tabulated by Sarhan and Greenberg [17]. These tables are based on 
ten-decimal-place expected values of order statistics and of products of order 
statistics tabulated by Teichroew [19]. The author, with the assistance of Eugene 
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27 
28 
29 
30 


31 
32 
33 
34 
35 


36 
37 
38 
39 
40 


41 
42 
43 
44 
45 


46 
47 
48 
49 


1, 128379 
1, 692569 
2, 058751 
2. 325929 


2.534413 
2. 704357 
2. 847201 
2.970026 
3. 077505 


3, 172873 
3. 258455 
3. 335980 
3. 406763 
3, 471827 


3, 531983 
3. 587884 
3, 640064 
- 688963 
« 734950 


- 778336 
- 819385 
- 858323 
3, 895348 
3, 930629 


3, 964316 
3. 996539 
4.027414 
4. 057044 
4, 085522 









4.112928 
4, 139338 
4, 164817 
4. 189425 
4.213219 








4. 236247 
4.258554 
4.280183 
4.301171 
4, 321554 









4, 341364 
4. 360631 
4, 379382 
4. 397644 
4.415439 


4. 432790 
4. 449718 
4. 466242 
4, 482379 
4.498147 

















H. LEON HARTER 


TABLE 1 


Expected Value of the rth Quasi-Range for Samples of n from Ni, 1) 


0. 594023 
0. 990038 


1, 283510 
1.514749 
1, 704450 
1, 864595 
2.002714 


2.123833 
2.231464 
2. 328154 
2, 415805 
2. 495870 


2. 569488 
2.637564 
2. 700827 
2.759877 
2, 815208 


2. 867236 
2.916311 
2.962731 
3, 006755 
3, 048602 


3. 088468 
3. 126520 
3. 162907 
3. 197761 
3.231200 


3. 263326 
3.294235 
3. 324009 
3. 352725 
3. 380451 


3. 407249 
3.433177 
3. 458286 
3. 482623 
3. 506233 


3. 529154 
3.551424 
3.573076 
3, 594143 
3, 614654 


3. 634635 
3.654111 
3.673108 
3. 691645 
3. 709744 


0. 
0. 
0. 
1. 
1. 


1 
1 
1, 
1 
i 


1. 
2. 
2. 
2. 
2. 


2. 
2. 
2. 
2. 
2. 
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vwevwew 








vweevw 


- 570219 
«612819 
- 653484 
- 692372 
- 729624 


+ 765363 


- 924875 
- 953513 
- 981218 
- 008047 
. 034049 


- 059272 
. 083756 
. 107544 
. 130670 
- 153169 


. 175071 
- 196406 
.217201 
237481 
. 257268 


403094 
705414 
945645 
143942 
312118 


457679 
585676 
699669 
802253 
895378 


980542 
058922 
131456 
198906 
261896 


320945 
376488 
428893 
478476 
525506 


799700 


. 832734 
- 864556 
- 895245 

















0, 305029 
0, 549052 
0, 751529 


0.923957 
1, 073686 
1, 205700 
1, 323527 
1, 429755 


1, 526333 
1, 614770 
1, 696250 
1.771724 
1, 841963 


1, 907604 
1, 969174 
2.027118 
2.081814 
2. 133585 





2.182707 
2.229423 
2.273942 
2. 316449 
2. 357108 


2. 396061 
2.433439 
2. 469355 
2.503912 
2.537204 


2.569314 
2.600317 
2.630284 
2.659277 
2.687353 


2. 714565 
2. 740962 
2. 766588 
2.791484 
2. 815688 





2. 839235 
2. 862158 
2. 884486 
2.906249 
2.927472 





























0. 245336 


0, 449782 
0, 624498 
0, 776654 
0.911132 
1, 031402 


1, 140019 
1.238915 
1, 329589 
1, 413223 
1, 490766 


1. 562992 
1, 630538 
1. 693938 
1. 753638 
1, 810021 


1. 863411 
1. 914092 
1. 962 307 
2.008271 
2.052170 


2.094171 
2. 134420 
2.173049 
2.210174 
2.245901 





2.280326 
2. 313532 
2. 345600 
2. 376598 
2.406591 


2.435639 
2. 463796 
2.491111 
2.517629 
2. 543394 


2. 568444 
2.592816 
2, 616542 
2.639654 
2. 662181 


























0. 205179 
0. 381047 
0. 534594 
0, 670592 


0, 792446 
0, 902667 
1, 003163 
1, 095415 
1. 180594 


1, 259644 
1, 333334 
1, 402301 
1, 467076 
1, 528108 


1, 585779 
1, 640416 
1. 692 302 
1, 741683 
1. 788775 


1, 833766 
1. 876825 
1.918099 
1.957721 
1.995809 





- 032471 
- 067802 
- 101890 
. 134813 
- 166643 


NNNN NW 


. 197445 
-227280 
- 256203 
. 284264 
- 312510 


NNN WN 


2. 337984 
2. 363725 
2. 388771 
2.413155 
2. 436910 


























0.176318 
0, 330597 


0. 467503 
0. 590373 
0. 701674 
0, 803285 
0. 896664 


0. 982970 
1, 063136 
1, 137928 
1.207975 
1.273807 


1, 335872 
1. 394549 
1. 450167 
1, 503007 
1, 553316 


1.602311 
1.647180 
1.691092 
1. 733196 
1. 773626 


1, 812500 
1, 849927 
1, 886002 
1.920814 
1, 954443 


1. 986961 
2.018434 
2.048923 
2.078483 
2, 107166 





2.135019 
2. 162086 
2. 188406 
2.214017 
2.238954 




















0, 154575 
0.291975 
0, 415471 
- 527486 
- 629866 


. 724051 
- 811183 
- 892185 
- 967812 
- 038691 





1, 105347 
1, 168222 
1.227697 
1, 284097 
1, 337704 


1, 388764 
1, 437492 
1, 484078 
1, 528690 
1.571478 


1, 612575 
1.652101 
1, 690164 
1. 726860 
1, 762279 


1, 796500 
1, 829596 
1, 861634 
1, 892675 
1.922775 


1.951985 
1, 980354 
2.007924 
2.034738 
2. 060832 


























































0, 137605 
0, 261450 
0, 373915 


0, 476816 
0. 571570 
0. 659305 
0. 740931 
0. 817195 











. 888717 
. 956017 
- 019535 
. 079647 
. 136678 


a --) 











- 190907 
- 242580 
- 291910 
- 339088 
- 384281 


—— ee 





- 427639 
- 469294 
- 509367 
- 547965 
- 585186 






eee 















- 621118 
- 655842 
- 689430 
- 721950 
- 753463 


— ee 





- 784025 
- 813688 
. 842500 
. 870505 
. 897744 


—— ee 
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TABLE | (continued) 


Expected Value of the rth Quasi-Range for Samples of n from Niu, 1) 


4.513562 | 3.727424 | 3.276586 | 2.948181 | 2.684150 | 2.460065 | 2.263249 | 2. 086242 1, 924256 
4.528637 | 3.744702 | 3.295455 | 2.968397 | 2.705587 | 2.482647 | 2.286933 | 2.110999 | 1.950074 
4. 543388 | 3.761597 | 3.313894 | 2.988142 | 2.726514 | 2.504683 | 2.310032 | 2.135136 | 1.975233 
4.557827 | 3.778123 | 3.331921 | 3.007438 | 2.746955 | 2.526197 | 2.332574 | 2.158679 | 1.999762 
4.571967 | 3.794295 | 3.349553 | 3.026301 | 2.766929 | 2.547210 | 2.354583 | 2.181655 | 2.023691 


4.585818 | 3.810128 | 3.366805 | 3.044750 | 2.786457 | 2.567746 | 2.376082 | 2.204090 | 2, 047045 
4, 599393 | 3.825635 | 3.383695 | 3.062803 | 2.805556 | 2.587823 | 2.397093 | 2.226007 | 2, 069851 
4.612701 | 3.840828 | 3.400234 | 3.080474 | 2, 824244 | 2.607460 | 2.417635 | 2.247427 | 2.092132 
4.625752 | 3.855719 | 3.416437 | 3.097778 | 2, 842538 | 2.626675 | 2.437728 | 2.268371 | 2.113909 
4.638556 | 3.870319 | 3.432316 | 3.114730 | 2, 860452 | 2.645484 | 2.457390 | 2.288858 | 2.135204 


4.651122 | 3. 884639 | 3.447884 | 3.131343 | 2.878001 | 2.663904 | 2.476638 | 2.308906 | 2, 156036 
4. 663457 | 3.898688 | 3.463151 | 3.147629 | 2.895199 | 2.681948 | 2.495488 | 2. 328534 | 2.176423 
4.675569 | 3.912477 | 3.478128 | 3.163599 | 2.912058 | 2.699632 | 2.513954 | 2.347756 | 2. 196382 
4.687467 | 3.926014 | 3.492827 | 3.179267 | 2.928591 | 2.716968 | 2.532052 | 2.366588 | 2.215931 
4.699157 | 3.939308 | 3.507255 | 3.194641 | 2.944809 | 2.733968 | 2.549794 | 2.385044 | 2.235084 


4.710646 | 3.952367 | 3.521423 | 3.209732 | 2.960724 | 2.750646 | 2.567194 | 2.403139 | 2.253356 
4.721941 | 3.965199 | 3.535339 | 3.224550 | 2.976347 | 2.767011 | 2.58426 2.420886 | 2.272261 
4.733047 | 3.977811 | 3.549011 | 3.239104 | 2.991685 | 2.783075 | 2.601013 | 2.438295 | 2.290312 
4, 743971 | 3.990210 | 3.562448 | 3.253403 | 3.006751 | 2.798849 | 2.617456 | 2.455380 | 2. 308021 
4.754718 | 4.002402 | 3.575656 | 3.267455 | 3.021552 | 2.814341 | 2.633601 | 2.472152 | 2. 325401 


4.765294 | 4.014395 | 3.588644 | 3.281267 | 3.036096 | 2.829561 | 2.649458 | 2.488620 | 2, 342462 
4.775704 | 4.026195 | 3.601418 | 3.294848 | 3.050393 | 2, 844517 | 2.665037 | 2.504796 | 2.359216 
4.785953 | 4,037806 | 3.613984 | 3.308204 | 3.064450 | 2.859219 | 2.580347 | 2.520688 | 2. 375673 
4.796045 | 4.049236 | 3.626349 | 3.321343 | 3.078275 | 2.873674 | 2.695396 | 2.536305 | 2.391841 
4, 805985 3. 091874 | 2, 887890 | 2.710192 | 2.551658 | 2.407731 


4.815777 | 4.071569 | 3.650499 3.105254 | 2.901874 | 2.724744 | 2.566752 | 2.423351 
4. 825426 | 4.082483 | 3, 662296 ° 3.118422 | 2.915633 | 2.739059 | 2.581598 | 2.438709 
4. 834935 | 4.093235 | 3.673914 ° 3.131384 | 2.929174 | 2.753143 | 2.596201 | 2.453615 
4. 844308 | 4.103829 | 3.685358 ° 3. 144146 | 2.942503 | 2.767004 | 2.610571 | 2.468674 
4.853549 | 4.114270 | 3, 696633 ° 3.156714 | 2.955626 | 2.780649 | 2.624712 | 2.483296 


4, 862661 | 4.124561 | 3.707745 ° 3. 169094 | 2.968550 | 2.794083 | 2.638633 | 2.497686 
4. 871648 | 4.134708 | 3.718696 3.181289 | 2.981279 | 2.807312 | 2.652339 | 2.511851 
4. 880513 | 4.144713 | 3, 729492 3.193307 | 2.993819 | 2.820343 | 2.665836 | 2.525799 
4. 889259 | 4.154581 | 3.740137 ° 3.205150 | 3.006176 | 2.833180 | 2.679131 | 2.539534 
4.897890 | 4.164315 | 3.750635 | 3.453206 | 3.216825 | 3.018355 | 2.845830 | 2.692229 | 2.553063 


4.906407 | 4.173918 | 3.760988 | 3.464176 | 3.228335 | 3.030359 | 2.858296 | 2.705135 | 2. 566393 
4.914814 | 4.183393 | 3.771203 | 3.474994 | 3.239685 | 3.042194 | 2.870585 | 2.717855 | 2.579527 
4.923114 | 4.192745 | 3.781280 | 3.485666 | 3.250879 | 3.053864 | 2.882700 | 2.730393 | 2.592471 
4.931308 | 4.201975 | 3.791225 | 3.496196 | 3.261921 | 3.065374 | 2.894647 | 2.742754 | 2.605232 
4.939401 | 4.211087 | 3.801040 | 3.506585 | 3.272815 | 3.076727 | 2.906429 | 2.754944 | 2.617812 


4. 947393 | 4.220084 . 810729 | 3.516839 | 3.283565 | 3.087928 | 2.918051 | 2.766965 | 2.630218 
4.955288 | 4 228968 - 820294 | 3.526960 | 3.294173 | 3.098980 | 2.929517 | 2.778824 | 2.642452 
4.963087 | 4.237743 . 829739 | 3.536952 | 3.304644 | 3.109887 | 2.940830 | 2.790523 | 2.654521 
4.970794 | 4.246410 . 839066 | 3.546818 | 3.314980 | 3.120652 | 2.951995 | 2. 802066 | 2.666428 
4. 978409 | 4.254972 . 848278 | 3.556560 | 3.325186 | 3.131279 | 2.963015 | 2.813458 | 2.678176 


4.985935 | 4.263431 | 3.857378 | 3.566181 | 3.335264 | 3.141772 | 2.973894 | 2.824702 | 2.689771 
4.993374 | 4.271790 | 3.866368 | 3.575685 | 3,345216 | 3.152132 | 2.984634 | 2.835802 | 2.701215 
5.000728 | 4.280051 3.875251 | 3.585074 | 3.355047 | 3.162364 | 2.995240 | 2.846761 | 2.712512 
5.007998 | 4.288217 | 3.884029 | 3.594350 | 3.364759 | 3.172471 | 3.005713 | 2.°57582 | 2.723666 
5.015187 | 4.296289 | 3.892705 | 3.603517 | 3.374354 | 3.182455 | 3.016059 | 2.868269 | 2.734680 
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H. Guthrie, has computed tables of var w,, accurate to within a unit in the 
fifth decimal place, for r = 0 (1) 8 and n = (2r + 2) (1) 100, using the Univac 
Scientific (ERA 1103) computer. Since Cadwell’s expression for the probability 
density function of w, involves an integral with respect to another variable z, 
it was necessary to integrate numerically with respect to both w, and x. A seven- 
point integration formula was employed for a few cases where the trapezoidal 
rule did not give sufficient accuracy for the integral with respect to w,, while 
the trapezoidal rule was used for all cases in the integration with respect to z. 
The variances, which are shown in Table 2, agree with those obtained by Tippett, 
Pearson, and Cadwell to within a unit in the last place published by them. The 
values in Table 2 also agree, to within a unit in the fifth decimal place, with 
results computed from the Sarhan and Greenberg table of variances and co- 
variances of order statistics for n = 2 (1) 20. 

1.3. Covariance of two quasi-ranges. In order to determine the variances of 
unbiased estimates based on linear combinations of two quasi-ranges (and 
hence their efficiencies), it is necessary to know not only the variances of the 
two quasi-ranges but also their covariance. The covariance of the rth and r’th 
quasi-ranges for samples of size n is given by cov (w,, w,) = E(w,w,) — 
E(w,)E(w,), in which E(w,w,) = 2[E(2r4:te41) — E(tr4:tn--)). Wilks ([21], 
p. 20) has given an expression for the joint probability density function of the 
kth and k’th order statistics. Godwin [7] has tabulated (to five decimal places) 
the covariances of all order statistics for samples of n = 2 (1) 10. The more 
extensive and more precise tables of Sarhan and Greenberg [17] were mentioned 
in the preceding paragraph. The author, again with the assistance of Eugene H. 
Guthrie, undertook the task of computing cov (w, , w,) forO S r <r’ S 8 and 
n = (2r’ + 2) (1) 100, accurate to within a unit in the fifth decimal place, 
using the Univac Scientific (ERA 1103) computer. Finding that the complete 
tabulation required too much machine time, he decided to limit the computations 
to those values required to determine the most efficient estimates of the popu- 
lation standard deviation based on linear combinations of two adjacent quasi- 
ranges and of two quasi-ranges among those with r < r’ S 8 forn = 4 (1) 100, 
together with the numerical values of the efficiencies of these estimates. Wilks’ 
expression for the joint probability density function of x, and 2, was first 
integrated with respect to x, between the limits x and © by using a seven- 
point integration formula; the result was then integrated with respect to 2; 
between the limits —« and « by employing the trapezoidal rule. For 
n = 4(1) 20, the expected values of products of order statistics computed 
in this manner agree with Teichroew’s values to within a unit in the sixth decimal 
place and the covariances of quasi-ranges computed from these results agree 
to within a unit in the fifth decimal place with those computed from the co- 
variances of order statistics tabulated by Sarhan and Greenberg. 

1.4. Unbiased estimates of population standard deivation. The minimum variance 
unbiased estimate (which will hereafter be called the efficient estimate) of popu- 
lation standard deviation ¢ is the one based on the sample standard deviation s, 
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and given by the equation ¢ = s/cz, where s = [>, (x — #)’/(n — 1)]' and 
Ce = [2/(n — 1)}P(n/2)/T{(n — 1)/2]. The unbiased estimate of o based on 
one sample quasi-range is given by the equation ¢, = w,/E(w,) while the un- 
biased estimate of o based on a linear combination of two sample quasi-ranges 
is given by the equation 
(1) ; Wr + Apes Wr? 

E(w) + Yr, 9E (wer) ’ 


TABLE 2 


Variance of the rth Quasi-Range for Samples of n from N(y, 1) 


- 32315 


- 34734 | . 12588 
- 35350 | . 18023 
- 35231 | .20590 | . 07602 
- 34796 | .21829 | . 11578 
- 34231 | . 22394] . 13799 


- 33619 | .22594] . 15081 

- 33003 | .22589 | . 15830 . 03649 
. 32402 | .22466 | . 16256 - 05980 
- 31827 | .22275] . 16481 - 07525 
- 31280 | .22045 | . 16576 . 08577 


- 30763 | .21796 | . 16588 - 09305 
«55361 | . 30275 | .21537] . 16544 - 09814 
- 54551 | .29816 | .21277 | . 16462 - 10171 
- 53799 | .29382 | .21019 | . 16356 - 10419 
- 53098 | .28973 | .20765 | . 16234 - 10588 


- 52442 | .28586 | .20518]| . 16101 - 10700 
+ 51827 | .28220 | .20279 | . 15963 - 10768 
- 51249 | .27874 | 20048] , 15822 - 10805 
- 50703 | .27545 | .19825]| . 15679 - 10817 
- 50188 | .27233 | .19611 | . 15537 - 10810 


- 49699 | . 26936 | .19404 | . 15396 - 10789 
- 49236 | .26653 | .19205] . 15258 - 10757 
- 48796 | . 26383 | .19014] 15122 - 10717 
- 48377 | . 26126 | , 18830] . 14989 - 10671 
- 47977 | .25879 | . 18652 | . 14859 - 10619 


» 47595 | .25644 | . 18481 | . 14732 - 10565 
+ 47229 | .25417 | . 18316 | . 14609 - 10507 
- 46879 | .25201 | . 18158] . 14489 . 10448 
- 46544 | .24992 | . 18004] . 14372 - 10388 
« 46221 | . 24792 | . 17857 | . 14259 - 10327 
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TABLE 2 (continued) 
Varianc. of the rth Quasi-Range for Samples of n from N(, 1) 
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TABLE 2 (continued) 


Variance of the rth Quasi-Range for Samples of n from N(, 1) 


- 38890 


. 38780 
- 38672 
- 38566 
- 38463 
- 38360 


- 38260 
- 38161 
- 38064 
- 37969 
. 37874 


+ 37782 
. 37691 
. 37601 | .19424 
. 37513 | .19370 
. 37426 | . 19316 


- 37340 | . 19264 
- 37256 | . 19212 
- 37173 | .19160 
. 37091 | . 19109 
- 37010} . 19060 


- 36931 | .19011 
- 36852 | . 18963 
- 36774} . 18916 
. 36699 | . 18868 
. 36624] . 18822 


where X,,,, is a weighting factor. In the expressions for &, and é,,,, the expected 
values are understood to be those for samples drawn from N(0, 1), the standard 
normal population. 

1.5. Efficiency of unbiased estimates of o. The efficiency of the efficient esti- 
mate ¢ is by definition 1 (100%). The efficiency of a substitute estimate is de- 
fined as the ratio of the variance of the efficient estimate to the variance of the 
substitute estimate. Thus the efficiency of &, is given by Eff ¢, = var é/var 4, , 
while the efficiency of G,, is given by Effé,,. = var é/varé,,.., where 
var ¢ = [(1 —c})/c3]o”. By varying the weighting factor X,.-- , one may obtain 
a one-parameter family of unbiased estimates @,,,,. However, there is just one 
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TABLE 3 


Efficiency (Percent) of Estimate of Population Standard Deviation Based on the 


rth Quasi-Range for Samples of n from N( uo?) 





value of \,,-- which minimizes V,,, = var @,,-, and hence maximizes Eff é,,, . 
This value of \,,,- which maximizes the efficiency of the estimate may be obtained 
by setting dV,,,-/d,,, = 0 and solving for \,,- , which yields 


(2) ae E(w,) var w,, — E(w,-) cov (w,, wr) 
wa E(w,) var w, — E(w,) cov (wry, wr) © 


Table 3 shows the efficiency of estimates based on w, for r = 0(1)8 and 
n = (2r + 2) (2) 50 (5) 100, accurate to within 0.01%. Table 4 gives the 
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TABLE 4 
Most Efficient Unbiased Estimates of Standard Deviation of Normal Population 


e | Based on one quasi-range [Based on a linear combination of two 
cent quasi-ranges 


——— 


Based on a linear combination of 
two quasi-ranges among those 
with r< r's8 


Efficient 
Estimate 


- 886227 w 

- 590818 w, 
+ 485731 w, 
+ 429936 > 


- 394569 wy, 
- 369774 we 
. 351222 
. 336697 
« 324939 


- 315172 

; 306894 we, 
. 299762 wo 
+ 293534 w, 
. 288033 wo, 


- 283127 we 
- 278716 we 
. 370257 wy 
» 362335 wy 
+ 355214 wy 


- 348768 wy 
. 342899 wy 
. 337526 wy 
. 332584 wy 
. 328019 wy) 


45394 (w, 
. 37238 (w, 


+ 31803 (w, 
+27922 (wo 
+25010 (w, 
~ 22745 (w 

+ 20931 (wy 


+ 19444 (wy 
+ 18203 (w, 
+ 17150 (wy 
» 16244 (w, 
- 15457 (w, 


- 14765 (w, 

- 14153 (we 

«13606 (w,, 

- 13114 (we 

+ 12670 (w,, 
~ 


+ 12266 (w, 
- 11897 (wa 
+ 11558 (w, 
. 11246 (w 

. 10958 (we 


+0, 2427 w)) 
+0, 3631 w)) 


+0, 4752 w)) 
+0,5790 wy) 
+0, 6754 wy) 
+0, 7651 w)) 
+0. 8489 w)) 


+0,9276 wy) 
+ 1.0017 wy) 
+ 1.0717 w,) 
+ 1.1381 wy) 
41.2011 wy) 


+1,2612 w,) 
+1, 3186 w)) 
% 1, 3736 wy) 
+1, 4263 w)) 
+ 1.4769 w)) 


+ 1.5256 w,) 
+1.5726 w)) 
+ 1.6179 wy) 
+1,6617 w)) 
+ 1.7040 wi) 


- 45394 (w,, 
. 37238 (we 


+ 31803 (w, 
«27922 (we 
- 25010 (w, 
22745 (wo 
- 20931 (we, 


- 19444 (w, 
~21177 (w, 
«19848 (wo 
- 18704 (w, 
. 17708 (we 


- 16834 (we 
- 16060 (wo 
- 15369 (w, 
- 14750 (we, 
14192 (we 


« 13684 (w, 
+ 14637 (w, 
+ 14129 (we 
+ 13663 (wy 
- 13233 (wy 


+0, 2427 w,) 
+0, 3631 wy) 


+0, 4752 w)) 
+0, 5790 wy) 
+0, 6754 w}) 
+0, 7651 w}) 
+0, 8489 w;) 


+0,9276 w)) 
+0,9231 w) 
+1,0015 w2) 
+1,0762 w2) 
+1, 1477 w2) 


+1,2161 w») 
+ 1.2817 w) 
+1, 3448 w)) 
+ 1,4055 w9) 
+ 1, 4640 w9) 


+1. 5206 wp) 
+ 1.5298 w3) 
+1.5881 w3) 
+ 1, 6446 w3) 
+ 1.6996 ws) 


s/. 797885 
s/, 886227 
s/, 921318 
o/, 939986 


s/. 951533 
s/. 959369 
s/. 965030 
s/. 969311 
s/. 972659 


s/. 975350 
s/. 977559 
s/. 979406 
s/. 980971 
s/, 982316 


s/. 983484 
s/. 984506 
s/, 985410 
s/,986214 
s/, 986934 


s/, 987583 
s/,988170 
s/, 988705 
s/. 989193 
s/. 989640 


Based on a linear combination of 
Efficient 


aa Based on a linear combination of two 
Based on one quasi-range | adjacent quasi-ranges 


+ 323785 w 
319844 wy 
- 316165 wy 
+ 312719 wy 
. 309483 wy 


« 306436 wy 
. 357181 w, 
- 353016 w2 
- 349094 w2 
- 345394 we 


- 341895 w, 
. 338580 w2 
- 335433 w2 
+ 332442 w2 
+ 329593 wz 


+ 326875 w2 
«324280 w2 
+ 321798 w2 
+ 319420 wo 
+ 317141 w2 


- 352208 w, 
. 349387 w3 
- 346682 w3 
- 344086 w3 
- 341592 w3 


- 10691 (wy 
+ 10442 (wo 
. 10209 (w,, 
- 09992 (w, 
- 09788 (wa 


«09596 (w, 
- 09416 (w, 
09245 (wo 
. 14484 (w, 
- 14201 (w, 


«13934 (wy 
. 13680 (wy 
« 13440 (wy 
- 13211 (wy 
- 12995 (wy 


. 12788 (wy 
+ 12592 (wy 
- 12403 (w, 
+ 12223 (w 
12051 (wy 


- 11886 Aw, 
< 11727 (wy 
. 15270 (wz 
. 15056 (w2 
. 14850 (w2 


+1. 7451 w)) 
+ 1.7849 w)) 
+ 1, 8235 w)) 
+1, 8610 w)) 
+1,8975 w)) 


+1.9329 w}) 
+ 1.9675 w}) 
+2, 0011 wy) 
+1,2398 wz) 
+1, 2646 w>) 


+1, 2888 wy) 
41,3126 w2) 
+1, 3358 wy) 
+1. 3586 w) 
+1, 3806 w2) 


+1, 4026 w2) 
+1,4236 w2) 
+ 1, 4448 w2) 
+1, 4652 w2) 
+ 1.4853 w2) 


+1,5051 w2) 
+1,5246 w2) 
+ 1.1550 w3) 
+1,1714 w3) 
+1, 1877 w3) 


two quasi-ranges among those 


- 12836 (wy 
- 12468 (wo, 
- 12126 (wo 
- 11807 (we 
» 11508 (we 


+ 11229 (w, 
- 10967 (w, 
+ 11551 (we 
+ 11282 (wo 
- 11028 (w,, 


. 10788 (w, 
. 10561 (we 
. 10345 (w,, 
- 10141 (we 
- 09947 (wo 


- 09762 (wy 
- 09586 (we, 
09418 (wy 
. 09258 (we 
+ 09647 (w,, 


- 09482 (we 
+ 09323 (wy 
+ O9171 (wy 
+ 09025 (w,, 
- 08885 (w,, 


+1, 7529 w,;) 
+1, 8050 w3) 
+ 1, 8556 w3) 
+1, 9050 w3) 
+ 1.9532 w3) 


+2, 0002 w3) 
+2, 0461 w3) 
+2, 0673 wa) 
+2. 1150 w4) 
+2, 1616 wa) 


+2, 2073 wa) 
+2.2520 wa) 
+2. 2962 w4) 
+2. 3393 wa) 
+2. 3816 w4) 


+2, 4232 wa) 
+2, 4640 wa) 
+2, 5042 wa) 
+ 2.5435 wg) 
+2, 5741 ws) 


+2, 6150 ws) 
+2, 6553 we) 
+2. 6951 we) 
+2. 7341 ws) 
+2.7726 ws) 


Estimate 


s/, 990052 
s/, 990433 
s/, 990786 
sh F9Lll3 
sA991418 


34991703 


3/, 992675 


s/, 992884 
s/, 993080 
8/,993267 
3/, 993443 
s/,993611 


s/,993770 
8/,993922 
8 /, 994066 
8 /,994203 
8/,994335 


3/, 994460 
s/, 994580 
84994695 
3/4. 994806 
84994911 













Sample 
size,n 


66 
67 


68 
69 


70 


71 
72 
73 
74 
75 






Sample 
size,n 







100 
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Table 4 (continued) 


Most Efficient Unbiased Estimates of Standard Deviation of Normal Population 




































. 347463 wa 
- 345399 wg 
- 343400 w4 
- 341461 w4 
. 339581 wg 










. 337755 wy 
. 335982 w4 
334260 w4 
. 332585 w4 
- 330956 w4 






~ 339192 wy ) 

« 336882 w, - 14460 (w, 
- 334656 w3 . 14278 (w2 
. 332509 w3 « 14100 (w9 
. 339436 w3 . 13931 (wo 


- 328434 w3 . 13763 (w2 
- 326498 w3 . 13606 (w2 
« 324625 w3 . 13453 (wp 
- 322812 w3 . 13304 (we 
. 321055 w3 - 13161 (wo 





- 13022 (w2 
. 15710 (w3 
. 15534 (w3 
. 15370 (w3 
- 15210 (w3 


+ 15052 (w3 
+ 14895 (wy 
« 14750 (w3 
+ 14607 (w3 
« 14466 (w3 











inear combination of two 


3 
+1,2195 w3) 
+ 1.2348 w3) 
+1, 2504 w3) 
+ 1.2652 w3) 


+1, 2805 w3) 
+1,2949 w3) 
+1, 3093 w3) 
+1, 3236 w3) 
+1, 3374 w3) 


+1, 3514 w3) 
+1, 1114 w4) 
+1, 1243 w4) 
+1, 1360 w4) 
+1, 1478 w4) 


+1, 1598 w4) 
+1,1723 w4) 
+ 1.1835 w4) 
+ 1.1949 w4) 
+ 1.2064 w4) 


























- 11693 (wy 

















- 329370 wg 
. 327827 wg 
- 326323 wg 
. 324857 wg 
- 346274 we 










. 344605 67, 08 
« 342979 
- 341393 
- 339847 
- 338338 
























- 336865 
- 335427 
334022 
« 332649 
- 331306 





























- 329994 


- 328710 
327454 


- 345465 
- 344065 














+ 342694 
- 341353 
- 340040 
. 338754 
- 337494 








- 336259 
. 335049 
- 333863 
- 332700 
- 331559 w 













“6 
“6 
6 


w 












¢ 


- 14333 (w3 
- 14201 (w3 
- 14071 (w3 
- 13943 (w3 
- 13821 (w3 


13703 (w 


. 15847 (wy 
. 15704 (w4 
. 15568 (w4 
- 15431 (w4 


- 15305 (wa 
- 15176 (wa 
- 15056 (w4 
- 14934 (wg 
- 14809 (w4 


14697 (w4 


. 14584 (w 
+ 14475 (wg 


14363 (wy 


- 14259 (wy 


- 16064 (ws 
- 15934 (ws 
. 15823 (ws 
- 15705 (we 
+ 15592 (we 


. 15480 (we 
. 15366 (we 
. 15261 (we 
- 15161 (ws 
- 15059 (ws 




















+1,.2172 w4) 
+ 1.2284 wg) 
+1.2395 w4) 
+1.2510 wa) 
+ 1.2617 wa) 


Based on a linear combination of two 
adjacent quasi-ranges 


+1.2723 w4) 
+ 1.0948 ws) 
+1. 1049 ws) 
+1.1145 we) 
+1.1245 ws) 













+1,1334 
+1, 1432 
+1,1519 
+1,1612 
+1.1714 


ws) 
ws) 
ws) 
ws) 
ws) 


+1. 1800 


+1,1890 
+1.1977 


+1.2071 
+1,2156 


ws) 
ws) 
ws) 
ws) 
ws) 


+1,0751 
+1. 0844 
+1.0916 
+1,0999 
+1. 1078 


we) 
we) 
we) 
we) 
we) 


+1.1158 
+1,1244 
+1,1319 
+1,.1389 
+1, 1466 


we) 
we) 
we) 
we) 
we) 











- 11083 (w) 





| Estimate EEL 




















. 12710 (wy 
12551 (wy 


- 10453 (w, +2. 1956 


. 10357 (wy + 2.2145 
- 10264 (w) 
- 10171 (wy + 2.2515 
. 10082 (wy + 2, 2696 
- 09996 (wy 


- 09910 (wy 
. 09826 (wy 
. 09746 (w) 
- 09665 (wy 
- 09588 (wy 


09511 (wy 
. 09437 (wy 


- 09293 (wy 
- 09223 (w) 42.4595 we) 


- 09155 (w) 
- 09087 (w) 
- 09022 (w, 
- 08957 (w) +2. 5245 wa) 
. 08895 (w 


Based on a linear combination of 
two quasi-ranges among those 
with r<r's= 8 











Efficient 
Estimate 





Estimate 





08 
08621 (w, 
08497 (w, 
08377 (w, 
08262 (w, 


« 510 
+2, 8479 
+2. 8847 
42.9212 
+2. 9567 


5 
ws) 
ws) 
ws) 
ws) 





13787 (w) 
13592 (w) 
13403 (w, 
13222 (w) 
13046 (w) 


+1, 6820 wg) 
+ 1. 7062 weg) 
+1, 7303 wg) 
+1. 7539 wg) 
+ 1.7774 wg) 





12875 (wv) +1, 8006 
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USE OF SAMPLE QUASI-RANGES 


-RANGES (%) 


-Gw 


Oo 
° 10 20 30 40 80 90 100 


SAMPLE SIZE,n 


Fig. 1. Efficiency of Estimates of Standard Deviation for Normal Population 


EFFICIENCY OF ESTIMATES BASED ON QUASI 


most efficient estimate based on one sample quasi-range, together with its 
efficiency, for n = 2 (1) 100, also the most efficient estimates based on linear 
combinations of two adjacent quasi-ranges and of two quasi-ranges among those 
with r < r’ S 8, together with their efficiencies, for n = 4 (1) 100, and the 
efficient estimate based on the sample standard deviation. For the estimates 
based on one sample quasi-range, the numerical coefficients 1/EH(w,) are ac- 
curate to within a unit in the sixth decimal place, and the efficiencies are accurate 
to within 0.01 %. For the estimates based on a linear combination of two sample 
quasi-ranges, the numerical coefficients 1/E(w, + X,,-w,) are accurate to 
within a unit in the fourth decimal place, the values of \,,,, are accurate to within 
a unit in the third decimal place, and the efficiencies are accurate to within 
0.01 %. The efficiency of the estimates based on quasi-ranges is shown graphically 
by Figure 1. It will be noted that for n < 56 the estimate based on the best 
linear combination of two quasi-ranges among those with r < r’ < 8 always 
involves the range (r = 0), with 1 S r’ S 5 while the best such estimate for 
56 < n S 100 is G4, with the efficiency dropping to 82.71% for n = 100. It 
seems likely that slightly better estimates for n near 100 could be obtained by 
dropping the restriction r’ < 8, but it is doubtful whether the increase in efficiency 
would exceed 1%, which would hardly justify the additional computation of 
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expected values, yariances, and covariances required to obtain such estimates. 
One may wonder why the estimate based on the best linear combination of two 
quasi-ranges is not the one based on the two quasi-ranges which do best indi- 
vidually. The reason (see also K. Pearson [15]) is that these two quasi-ranges, 
which are certainly adjacent ones, are too highly correlated to do best together. 

1.6. Comparison with Grubbs-Weaver estimates. Grubbs and Weaver [8] have 
proposed estimates of the population standard deviation based on a weighted 
average of the ranges of random subgroups of the complete sample. Since the 
optimum size of such subgroups is 8, the sample is divided into subgroups which 
are as nearly as possible of size 8. If the sample size is an integral multiple of 8, 
all subgroups are of size 8; otherwise, some subgroups are not of size 8, since 
no observations are discarded. The Grubbs-Weaver estimate is always more 
efficient than the estimate ¢, based on one sample quasi-range, except for n < 12, 
when it is identical with ¢, , both using the range of the complete sample, and 
always less efficient than the estimate based on the best linear combination of 
two quasi-ranges. The efficiency of the Grubbs-Weaver estimate égw for sample 
sizes up through 100 is shown in Figure 1 along with the efficiencies of the esti- 
mates based on quasi-ranges. The irregularities in gw are due partly to the 
inherent nature of the estimates and partly to the fact that the number of 
decimal places carried by Grubbs and Weaver is sufficient to yield values of 
the efficiency accurate only to within about 0.1%. The asymptotic efficiency of 
the Grubbs-Weaver estimate is 75.38%, since for samples of size 8, var w = 
.67212 and E(w.) = 2.8472, so that var @ = .08291, and hence the variance of 
Saw , Which is the mean of n/8 such estimates, is .08291/(n/8) = .6633/n, as 
compared with an asymptotic variance 1/2n = .5/n for ¢. By using results 
given by K. Pearson [15] and by Benson [1], one can easily show that the cor- 
responding asymptotic efficiencies are 65.23 % for estimates based on one quasi- 
range and also for those based on the best linear combination of two adjacent 
quasi-ranges, and approximately 80.08% for estimates based on the best linear 
combination of two quasi-ranges. 

1.7. Example. As an example of the use of estimates based on sample quasi- 
ranges, consider the following data, given by Morse and Kimball ({10], p. 134) 
and assumed to come from a normal population, which represent the deviation 
(in one dimension) from the aiming point of the mean point of impact of salvos 
of two projectiles: 


— 237 —23  Quasi-ranges: 

—133 —13 wo = 270 — (—237) = 507 

—93 —10 w, = 209 — (—133) = 342 

—77 57 «we = 173 — (—93) = 266 

—75 65 Sample standard deviation: s = 127.2 

—70 142 Estimates of population standard deviation: 
— 66 154 & = .855214 w, = 121.5 

—65 173) Go. = .12670 (wo + 1.4769 w,) = 128.2 

— 34 209 = G2 = .14192 (wo + 1.4640 w,) = 127.2 


—28 270 §=6é¢ = s/.986934 = 128.9 
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Morse and Kimball plotted the data on normal probability paper, fitted a straight 
line “by eye’’, and estimated the standard deviation as the difference between 
the 84% point and the 50% point, the result being 161, a value nearly 25% 
greater than the efficient estimate. A much better result could have been ob- 
tained by using an estimate based on a single quasi-range, and a still better one 
by using an estimate based on a linear combination of two quasi-ranges. It is 
also easier to arrange the data in order and make the simple quasi-range calcu- 
lations shown above than to plot the data on normal probability paper (though 
one may want to do the latter for other reasons). Moreover, it is really not 
necessary to arrange all the data in order; it would suffice in this example to 
pick out the three largest and the three smallest values. 


2. Estimates of o for non-normal populations. 

2.1. Rectangular population. For the standard rectangular population (mean 
zero and variance one), the probability density function is f(z) = 1/2+/3, 

3s 2 V3. It can easily be shown (see Cramér [3], p. 372) that the 
expected value and the variance of w, are E(w,) = 2+/3(n — 2r — 1)/(n + 1) 
and var w, = 12(2r + 2)(n — 2r — 1)/(n + 1)?(n + 2). An unbiased esti- 
mate of the standard deviation of a rectangular population is given by 
¢, = w,/E(w,), where E(w,) is understood to be taken for the standard 
rectangular population. The variance of 6, is varé, = (2r + 2)/[(n + 2) 
(n — 2r — 1)]. It is evident that the range is more efficient than any of the 
quasi-ranges for estimating o, since increasing r both increases the numerator and 
decreases the denominator of the expression for var é,. As a matter of fact, it 
can be shown that the range is an efficient statistic for estimating the standard 
deviation of a rectangular population. Table 5 gives the unbiased estimates 
& for n = 2 (1) 100. The numerical coefficients 1/E (wo) are accurate to within 
a unit in the sixth decimal place. Since the efficiency is always 100%, it is not 
given in the table. 

2.2. Exponential population. For the exponential population with mean and 
variance each equal to one, the probability density function is f(z) = e”, 
0 s x < o. Rider [16] has shown that the expected value and the variance of 
w, for samples of n from this population are 


n—r—l 


(3) E(w,) 


3 
j=r+1 J 


and 


n—r—1 
1 
(4) var w, = =: 
j=r+1 J 
An unbiased estimate of o for an exponential population is ¢, = w,/E(w,) 
where E(w,) is understood to be taken for an exponential population with variance 
one. The variance of @, is 


n— > 1 
(5) var G, = ¥ /(S *) 
ey f j=r+1 J 
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TABLE 5 


ESTIMATES OF STANDARD DEVIATION OF RECTANGULAR POPULATION 


Estimate basedj} Sample | Estimate based Sample 
on the range size, n on the range size, n 
36 













E stimate based 
on the range 


Sample 
size, n 
































































































































































- 305171 wy - 296923 wo 
2 - 866025 37 . 304713 wy - 296807 w, 
3 - 577350 wo 38 - 304279 w, 296694 w, 
4 - 481125 wo - 303869 w, . 296584 w, 
5 - 433013 - 303479 we - 296477 w, 
6 - 404145 w, - 303109 w, - 296373 w, 
7 - 384900 w, - 302757 w, - 296272 we 
8 - 371154 wy - 302422 wo - 296173 wo 
9 . 360844 w, + 302102 wo . 296077 w 
10 . 352825 we . 301797 we . 295983 we 
ll - 346410 wo - 301505 - 295892 w 
12 . 341162 wy . 301226 we . 295803 we 
13 - 336788 w, - 300959 we - 295716 wo 
14 . 333087 w, . 300703 w, - 295631 wo 
15 - 329914 we - 300458 w,, - 295548 wo 
16 . 327165 w, - 300222 w, - 295467 wy 
17 - 324760 w, - 299996 wo - 295389 wy 
18 + 322637 wo - 299778 wo - 295311 wy 
19 - 320750 wy - 299569 we - 295236 wo 
20 - 319062 - 299367 wy 295162 wo 
21 - 317543 wo - 299172 wo - 295090 wy 
22 - 316168 w, - 298985 w, - 295020 w, 
23 - 314918 w, . 298804 w, - 294951 w, 
24 - 313777 wo - 298629 wy - 294883 w, 
25 - 312731 - 298461 w, - 294817 w, 
26 - 311769 wo 298298 w. - 294753 we 
27 - 310881 w, - 298140 wy - 294689 w 
28 - 310058 wy - 297987 we - 294627 we 
29 . 309295 w, - 297839 w, 294566 w, 
30 - 308584 w, - 297696 wo - 294507 w,, 
31 - 307920 wy - 297557 we 
32 - 307299 wy - 297423 w, 
33 - 306717 w, - 297292 wy 
34 - 306171 w, - 297166 w, 
. 305656 w, 297043 w, 





For the exponential population with mean and standard deviation each equal to 
c, the probability density function is f(x) = (1/c)e*",0 < x < «. The sample 
mean £ is the efficient estimate of the parameter c, and has variance c’/n. When 
e = 1, var% = 1/n. Thus the efficiency, for an exponential population whose 
lower limit is zero (or some other known value 2»), of the estimate ¢, based on the 
rth quasi-range is given by the ratio of the variance of the efficient estimate 
t(or  — x) to the variance of &, , that is by 
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n—r—1 1 2 n—r—1 1 
(6) Baa, = ("Do *) n 2 s- 


j=r+1 J j=r+1 J 


The most efficient estimates ¢, , together with their efficiencies, have been com- 
puted for n = 2 (1) 100, but they will not be tabulated here, since the efficiencies 
are somewhat disappointing, varying from 50.00% to 61.73%. It should not be 
surprising that quasi-ranges, which are differences of symmetric order statistics, 
are not very efficient in estimating the standard deviation of an unsymmetric 
population. It is interesting, however, to note that the standard deviation of an 
exponential population whose lower limit is known can be estimated more 
efficiently from a single order statistic. The author is currently investigating the 
efficiency of estimates based on a linear combination of two order statistics, and 
preliminary results look promising, 

2.3. Bias when estimates which assume normality are used. Paragraphs 2.1 and 
2.2 cover the cases in which the population being sampled is known to be rec- 
tangular or exponential. Suppose, however, that the population is of one or the 
other of these two types, but the investigator who is interested in estimating the 
standard deviation is not aware of this fact, and proceeds to use one of the esti- 
mates which assume normality. In this case, the estimate is no longer unbiased. 
The bias of an estimate, based on one sample quasi-range, which assumes nor- 
mality, when the population being sampled is actually of some other type, is 
given by 


(7) Bo = [Eo(w,) — En(w,)|/E,(w,). 


The bias of an estimate, based on a linear combination of two sample quasi- 
ranges, which assumes normality, when the population being sampled is actually 
of some other type, is given by 


(8) By = Bolte) + dear BoC wr )) — [Bn (wr) + driv Bn (wr’)) 
0 E,(w,) + Xv.r En(wr) 


In equations (7) and (8), Z, represents an expected value taken for the normal 
population, while Ey represents an expected value taken for the other population, 
both populations having variance one. Table 6 gives the bias B, for a rectangular 
population and the bias B, for an exponential population when the estimates of 
Table 4, which assume normality, are used. In both cases, the values of the bias 
are accurate to within 0.01 %. 
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TABLE 6 


Bias (%) of Estimates which Assume Normality 















One 
qua si-range 


Two 
quasi-ranges 
with r<r <8 





Sample 
size, n 


Two adjacent 
quasi-ranges 









quasi-range | quasi-ranges | quasi-ranges 


with r<r'<=8 
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TABLE 6 


(Continued) 
Bias ( %) of Estimates which Assume Normality 


When Population is Rectangular ponential 


One wo adjacent Two One Two adjacent Two 
; i- i- uasi-ranges | qUasi-ranges | quasi-ranges 
quasi-rangq 9¥4S8i-ranges — ee q & eee 
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TABLE 6 


(Continued) 
Bias ( ¢ ) of Estimates which Assume Normality 


When Population is Exponential 


Two adjacent | Two One Two adjacent) Two 
quasi-range | quasi-ranges | quasi-ranges| quasi-ranges/|4U451-ranges quasi-ranges 


with r < r's8 
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THE JOINT CUMULANTS OF TRUE VALUES AND ERRORS 
OF MEASUREMENT 


By Freperic M. Lorp 


Educational Testing Service, Princeton, New Jersey 


1. Introduction. This note is concerned with the situation where U’ fallible 
measurements of some single characteristic are made on each of a large number 
of objects. The U measurements may represent U different methods of measuring 
the same characteristic, each method involving a different frequency distribution 
of errors of measurement. 

For each object, there is an unknown “true value” of the characteristic. The 
difference between the observed measurement and the true value is an error of 
measurement. The true value and the errors of measurement will be termed 
latent variables. 

The results derived are currently being applied in psychometric work, but they 
should be applicable in almost any field where unbiased fallible measurements 
are made. For example, the true amount (£) of some chemical constituent of the 
blood may have been fallibly but independently measured by U different methods 
(or by U different laboratory technicians) for each of a large number of hospital 
patients. The results given here will permit the consistent estimation of the first 
U cumulants of é, the first U cumulants of the error of measurement in each of 
the U methods, and, further, all the multivariate cumulants of the latent vari- 
ables up through order U. 

In psychometric work, U strictly parallel forms of a mental test may be pre- 
pared by matching the questions assigned to the different forms on their sta- 
tistical characteristics (determined by pretesting). These test forms may, in 
effect, all be administered “simultaneously” by the device of interspersing the 
questions from all forms and then scoring the questions of each form separately, 
counting the number answered correctly. The moments of the frequency distri- 
bution of the ‘true scores’ (£) of the examinees tested and also the distribution 
of the errors of measurement may now be estimated by the method to be de- 
scribed. (In this case, the shape of the distribution of errors of measurement must 
be dependent on the value of ~. This is apparent, for example, from the fact 
that the observed test scores cannot be negative; hence whenever £ is near zero 
large negative errors of measurement cannot occur.) 

Formulas illustrating the final results obtained are given in Section 2. The 
derivations are given in the remaining sections. 

In Section 3, any multivariate cumulant of the observed measurements is 
expressed as a linear function of the cumulants of the joint distribution of the 
latent variables, no assumption being made other than the existence of the 
cumulants in question (the results of this section are not new; they could be di- 
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rectly obtained by specializing a formula given by James ((1], eq. 13), for ex- 
ample). In Section 4 it is assumed that each error of measurement has a mean 
of zero and is uncorrelated with other appropriate chance variables; it is shown 
that each multivariate (and univariate) cumulant of the latent variables can be 
expressed in closed form as a simple linear function of the cumulants of the ob- 
served measurements. Section 5 details the restrictions imposed on the cumu- 
lants of the observed measurements by the assumption about uncorrelated errors. 


2. Specific formulas. There are U observed measurements on each object, 
denoted by 7, --- , tv. Both these and the true value, §, are random variables. 
The errors of mersurement are, by definition, 

(1) Cu = Te — §, (u=1,---,U). 


Any U-variate cumulant of the observed measurements is denoted by «¢,...cy, 
where C,, --- , Cv are nonnegative integers referring to the variables x; , --- , 
«ty, respectively. Any cumulant of the latent variables is similarly denoted by 
K g,,8;8,---8y, Where the first subscript refers to variable — and the other U sub- 
scripts refer to the U errors of measurement. It will be notationally convenient 
to use a zero-order cumulant, having all zero subscripts, that is by definition 
equal to zero. 

Explicit formulas are given below expressing all latent-variable cumulants up 
through the fourth order in terms of the observed-variable cumulants for the case 
where U = 4. All necessary formulas are either given or may be obtained by 
permutation of subscripts. The first subscript on each K, representing the true 
value, is not subject to permutation, but the U other subscripts on each K or x 
may be permuted providing the same permutation is made on each K and «x 
throughout the entire formula. 


Ko,:00 = Ko,ow = Ko,ono = Koon = 0 by assumption 
K, 0000 Ki000 = Ko100 = Koo10 = Kooo1 

K2 000 Ki100 = 1010 wes Koo11 

Ko 2000 K2000 —— K2000 — Kooi 

Ko 0200 ‘ “ne9 = ° °° Ko200 — Koo , etc. 

K 53,0000 

K2,100 = *°° oor = 0, by assumption 

Ky, 1100 no = Ko sno = 0, ete., by assumption 
K, 2000 “++ = Keon — Kom , etc. 
Ko. 3000 ¢ 3ko00 + 2x10, ete. 

K.4 0000 


K3,1000 = K = --+ = Koim = 0 by assumption 
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Ko 2000 = Keu0 — Kun, ete. 
K,3000 = 3100 — 3ken0 + 2xun , ete. 
Ko 2200 = 2200 — Keu0 — Kino + Kun , ete. 
Ko,4000 = Ks000 — 4ks100 + 6x20 — 3k1un » etc. 
3. General relations among cumulants. The characteristic function of the 


latent variables may be written 


(2) F(T), 7T:,°::, Tv) = Eexpi>. Tie, 


u=() 


where E is the expectation symbol and ey) = £. That of the observed measurements 
is, by (1), 


U U 
(3) Pitas tise at Ah aaa tie, Mie dds e 
w= u==0 


where to = ee tu. 

It is seen that the first characteristic function may be changed to the second 
simply by replacing 7 by ?¢. If the necessary cumulants exist, the cumulant- 
generating function of the latent variables is 


log F(T) , Ti, --- , Tv) 








. ->> ye To°Tr! «++ To? x 
= sig se bi ges as 
Bo=0 B,—=0 Boao Bo! B,! --- Be! ee 
where P = .~o B, . Take the right side of (4) and replace T; , --- , Tv by 
ti, -++, ty and T¢° by the multinomial expansion 
U Bo Bo! 
5 Bo a 0- ay ar 
(5) = t = eileen LP oe 
Z .) de a,! _ ar! : 
where @; , --* , @y are nonnegative integers and &, is over all sets of a such that 


vm 4, = By. This converts the right side of (4) to the cumulant-generating 


function of the observed measurements: 


log f(t; _ reo tv) 


Ph:.. 


Bo=0 By, By=0 


(6) 


2 -P ,By,+0; By+ey 

Se eae K 

a Bo, B,-**By + 
B,! me By! a! -** ay! - . 


The cumulant «c,...c,, of the observed measurements is the coefficient of the 
term i” tf --- t6°/C,! --- Cv! in the series at the right, where P’ = 5~2_, C,. 
If B, + a, is replaced by C,, in (6) and the terms rearranged, these cumulants 
are found to be 


(7) dining, sills aa i). 
u=] u 
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where fo = C,!/B,!(C, — B.)!, and where 2, is taken over all sets of non- 


negative integral values of the B, subject to the restrictions that B, < C,, for 
u=1,---,UandP’ =P. 

The result given in equation (7) may also be expressed in terms of symbolic 
multiplication: 


(8) Keyesey ™ (E + €1) (E+ ee) --> (E + ev) ™. 


The ~ symbol may be replaced by an equals sign when each term £”°ef'e}?- - -e8” 
on the right has been replaced by Kx,,2,.--2y 

Formulas (7) or (8) express any Pth- ender cumulant of the observed measure- 
ments as a linear function of the Pth-order cumulants of the latent variables. 
Without further assumptions, it is not possible to solve any set of these equa- 
tions for the unknown cumulants of the latent variables, for the reason that there 
are always fewer equations than unknowns. 


4. Determining the cumulants of the latent variables. The result in (7) and 
(8) was obtained without any assumption about the distribution of the latent 
variables other than the existence of the cumulants. It will now be assumed 
that each error of measurement has a mean value of 0 and is uncorrelated with 
every product of the remaining latent variables. Thus Kz,,2,...2, = 0 whenever 
any B, (u > 0) is equal to 1. This is much less restrictive than the usual as- 
sumption that the errors of measurement are distributed independently of the 
true value and of each other. The present assumption, for example, permits the 
variance of the »rrors of measurement and all higher moments to be dependent on 
¢—it is only the mean error of measurement that is independent of &. 

With this assumption, we may proceed to prove 

THEOREM 1. Given that K »,,2,.-.2y = 0 whenever any B, = 1 (u > 0), all equa- 
tions (7) for which ) > C, = P is constant can be ranked so that the right side 
of each contains at most one nonzero K appearing in no preceding equation; thus, 
given that the equations are consistent, they may be solved so as to express any K of 
order = U asa linear function of x’s. 

Let U — T be the number of subscripts on the left side of (7) that are equal 
to 1, so that the observed-variable cumulant may be written xc, ¢,...cpu--.1- 
For By < U — T, there must be at least one value of u > 0 for which B, = 
on the right side of (7), soevery Kx,,s,..-sy, Will vanish whenever By < U — T. 
For By = U — T, there is on the right side of (7) one and only one cumulant, 
K8,,8;---8y , Without unit subscripts; this unique nonvanishing cumulant has 7’ 
subscripts that are the same as those of the observed-variable cumulant and 
(at least) U — T zero subscripts, so it may be written K,v_n,c,c--.c70---0 
it will be epoken of as the latent-variable cumulant to which « ¢, c,... equ... (With 


1 As pointed out by a referee, formula (8) shows that the relation between observed- 
variable and latent-variable cumulants is exactly the same as the relation between ob- 
served-variable and latent-variable moments about an arbitrary origin. 
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U — T unit subscripts) corresponds. For By > U — T, there may be on the right 
side of (7) a number of cumulants K,,,,s,...2, without unit subscripts; each of 
these, however, is a latent-variable cumulant to which some other observed- 
variable cumulant «s,s,.-.syu--.1 having By unit subscripts corresponds. 

Consider all the observed-variable cumulants of a given order (P) and suppose 
them to be grouped according to U — T, the number of unit subscripts, and the 
groups ranked on U — T starting with the cumulant for which T = 0. The re- 
sults of the preceding paragraph show that equation (7) expresses any «x as a 
linear function of K’s, one of these being the K to which the given x corresponds, 
all the others being K’s to which correspond some other x’s of lower rank. This 
completes the proof of Theorem 1. 


5. Restrictions on the observed cumulants. If none of the C’s are zero in 
Ke,¢,---egu---1, then this is the only « that corresponds to K(v_2,¢,c---c700-+-0 + 
If V of the C’s in kc, cy... cpu---1 are zero, then every x obtained by permuting the 
zero and unit subscripts on « also corresponds to the same K. 

It followsfrom Theorem 1 that any K may be expressed as equal to any one of 


U-T 
x already is of lowest rank). For the x’s of lowest rank, 7 = V and there are 


the (' oS if corresponding x’s plus other «’s of lower rank (unless the first 


be . ¥) equally good equations, such as K,y_1),00...0 = Ku---100-.-0, there being 


U zero subscripts on the left side of the equation, T zero subscripts and U — T 


r 


U . Tr different «’s must thus be equal. 
Proceeding to the case of next higher rank, another set of «’s are found that must 
be equal to each other. Mathematical induction now shows that all «’s corre- 
sponding to a given K must be equal. Thus, 

THEOREM 2. Given that K5,,2,-..sy = 0 for any B, = 1 (u > 0), any two x’s 
will be equal if their subscripts are the same except for a permutation that involves 
zero and unit subscripts only. 

Theorem 2 states a restriction on the observed-variable «’s that is implicit in 
the assumption made in Section 4 about uncorrelated errors. Since the matrix 
of the large-sample sampling variances and covariances of the «’s could be com- 
puted if desired, the assumption made in Section 4 can be submitted to statistical 
test, at least in large samples, to determine whether or not any given set of ob- 
served data is compatible with it. 


unit subscripts on the right. These ( 
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SOME TESTS OF PERMUTATION SYMMETRY! 


By R. WorMLEIGHTON 


University of Toronto 


Summary. The two-sample sign test is viewed as a test for the permutation 
symmetry of a bivariate distribution, and extensions to k-variate distributions 
are sought. Friedman’s rank test, [1], although originally intended as a sub- 
stitute for the F-test in a two-way classification, is such an extension. Study of the 
family of two-sample sign tests obtained by comparing the k coordinates pair- 
wise has yielded a statistic with an asymptotic Chi-square distribution from 
which a further test of symmetry can be constructed. The statistic is based on 
more degrees of freedom than Friedman’s and is sensitive to a greater variety 
of alternatives. This extension is analogous to that obtained by Terpstra [2] 
from the Wilcoxon test.? In this case, however, the limiting distribution turns 
out to be non-singular. The argument leading to the test is not restricted to the 
case of complete symmetry but may be carried through with any specified de- 
gree of asymmetry. The coordinates may also be compared m at a time, 2 S m S 
k. The argument can be extended and, with a slight modification, includes the 
derivation of Friedman’s test. Thus a hierarchy of tests of permutation sym- 
metry are available: Friedman’s test corresponds to the case, m = 1; when 
m = k, the corresponding test turns out to be Pearson’s Chi-square. 


1. Introduction. Given n pairs of observations, ... sometimes called two 
“matched” samples, ... the sign test statistic for comparing the populations 
from which the two matched samples were drawn is the number of cases in 
which the first observation of a pair is greater than the second; a simple count. 
The statistic and its distribution are easily computed, the test requires minimal 
assumptions about the underlying probability distributions, and when these 
distributions are normal, the efficiency of the test relative to the t-test is high 
[3]. It is natural, therefore, to explore extensions of the test to three or more 
matched samples. 

In the case of three samples, J. W. Tukey has suggested the following ap- 
proximate test. To make the test at level a’, conduct ordinary two-tailed sign 
tests comparing each of the three pairs of samples, at level a = a’/3. If one or 
more of these three tests yields a significant result, the combined test is sig- 
nificant. This is a convenient approximation, but it does not appear to be worth 
while to extend this method to the case of more than three samples. 

In the general case of k matched samples of n—that is to say, n observations 
on k-variate distributions—one extension has been given by Friedman [1]. The 


Received April 16, 1956; revised April 13, 1959. 
1 Work done in part under a grant from the Office of Naval Research held at Princeton 
University. 
2 The author is indebted to the editors for drawing his attention to the work of Terpstra. 
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k coordinates of each observation are ranked in order of magnitude, and the mean 
ranks calculated. The sum of squares of deviations of these k mean ranks from 
their general mean is proportional to a statistic X? which, under the appropriate 
null hypothesis, has asymptotically a Chi-square distribution with k — 1 de- 
grees of freedom. Another extension is suggested in the preceding paragraph. 
Consider the family of two-sample sign tests obtained by comparing the k co- 
ordinates pairwise; the corresponding statistics form a set of C3 simple counts, the 
number of times the ith coordinate exceeds the jth in the sample. This paper is 
primarily concerned with this set of simple counts. 

Underlying much discussion of the sign test is an intuitive picture of two in- 
dependent and similar, if not identical, populations which are to be compared 
for differences in location. Friedman’s rank test is the natural generalization. 
However, regarding the two paired samples as a single sample from a bivariate 
distribution, the sign test becomes a test of the permutation symmetry of the 
distribution; the null hypothesis states that both orderings of the coordinates 
are equally probable. In the case of a k-variate distribution, the most general 
non-parametric test of permutation symmetry is based on the statistic (of di- 
mension, k!) giving the number of times each of the k possible orderings occurs 
in the sample. While such a test is of use against any non-symmetric alternative, 
it would appear that, unless the sample size is of the order k!, only the most 
extreme departures from symmetry would be detected. If only certain kinds of 
asymmetry are of interest then a more specific test is required; Friedman’s rank 
is an example of such a specific test. 

Suppose one wishes to determine whether a card-shuffling device is acceptable. 
Ideally, no matter what the order of the cards before shuffling, all orderings 
should be equally probable after shuffling. It would be acceptable, however, if it 
were practically impossible for a card-player, knowing the initial order of the 
cards, and the final position of some of them, to draw inferences about the posi- 
tion of the remainder. A bridge player holding the Queen of Spades should not 
be able to infer from the previous hand that the King of Spades is more likely to 
be on his right than on his left. If the initial position of a card is 7, let X; denote 
its final position after shuffling. Then we are concerned with the symmetry of the 
multivariate distribution of the { X,}. The most general test may very well require 
an impossibly large experiment—52! is a very large number. On the other hand, 
Friedman’s rank test is too specific. A single cut, provided all possible places for 
the cut are equally likely, (including no cut at all), is sufficient to ensure that the 
expectation of X; is the same for all 7; and, as is shown below, this implies that 
the expectation of X? is the same as it would be under the null hypothesis of 
complete symmetry. Clearly, something intermediate is required. 


2. Notation. Let 


(a) 7 (a) 7 (a) (a) 
Xx m (Xi, Xs ,*** phe 9 a=1,2,---,n, 


represent n k-variate real-valued random variables. Assume, as null hypothesis, 
that 
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(2.1) Pri xf? < Xf < +--+ < Xf} = 1/k! 


for all permutations (7; , #2, --- , %) of the subscripts (1, 2,--- , &), and all a. 
Let 


(2.2) Y$? = 1if XY > XY ixj 


0 otherwise, 
and define 


(2.3) 6,5 = Ay (Y? — 

Since §;; + 8;; = 0, we need consider only one of each pair, {8;; , 8;;}. It will 
appear (Lemma 1), that the choice does not affect our conclusions and we con- 
sider, therefore, only the set {@;,;} for which t < j. 

The exact probability at any point {8;;} is given by a sum of multinomial terms. 

TueorenM I: 


1 n! 
, Pr (fy) = = 2 a 
A) (k!) II ie 


where the sum is taken over all sets of non-negative integers | an}, (h = 1,2, --- , k!), 
satisfying 
ki 


a= n 
h=l 


and a set of C3 linear equations of the form 


(2.6) 2 e Cian = WnBi; — n, Cin = 1 or 0 as required. 
==] 


The moments of the 8;; can be computed directly. E{8;;} = 0. For any ad- 


missible choice of {8;;}, the covariance matrix is non-singular and can be in- 
verted. 


TaroreM II: Let (04%,;;) be the covariance matrix of the C >-dimensional random 
variate {B;;}. Let D be its determinant and let (o** *") be its inverse. Then for all 
permissible choices of the coordinates {8;;}, 


k(k—1) 


(2.7) D = (k+1)*"/3 7 


’ 


, : . : “~ és’, 3(k — 1) 
uf a i Cis’ 33’ = oF = "5ST 


if p= a) not defined 
of 


or 
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af i= Fs ’ # h 

or i<j, w=j on md isms 
, : k+1 

otherwise, = 0 = Q. 


Outline of proof: In the determinant, D, rows can be replaced by linear com- 
binations of rows to introduce blocks of zeros. Precisely, we replace ox", jp by 


1 = 
(2.8) Tis’.ip = Fi'.jp — = Dy Fides all (i2’). 
, 
(y¥<p) 


D can then be written as a product of principal minors, 


” 
(2.9) D = [I 4,, 

p=2 
where d, is of order (p — 1) with elements (2(p + 1))/3p on the diagonal, and 
(p + 1)/3p, elsewhere. 


_.etifT 
(2.10) d, = p[Ptt | ; 


Hence, D = (k + 1)F ghee, 

To verify the remainder of the theorem, we multiply the covariance matrix by 
its stated inverse, and evaluate the sums, > wwe ™. A term of such a 
sum will differ from zero only if the pair of indices (72’) has an element in common 
with each of the pairs, (hh’) and (jj’). 

For a diagonal element of the product, (hh’) = (jj’). There is one term in the 
sum with (2’) = (hh’) of value, (3(k — 1))/(k + 1); and 2(k — 2) terms in 
which (72’) has one element in cominon with (hh’), each of value —1/(k + 1); 
all other terms are zero, and the sum is unity. 

For non-diagonal elements of the product, where (hh’) and (j7’) have one or 
zero elements in common, we have, for example: 


k 
: ii’ ,13 12,13 13,13 23,13 1a’ 18 
(i) Zz 012,14 = 612,120 + 012,130 + o12,230 + = 012,140 = 0 
(8°) aw==4 
+. ii’ 34 13,34 14,34 23 ,34 24,34 
(ii) - 912,117 = 012,130 + 912,140 + 712 230 + o12,20 = 0. 
(t8”) 


3. The statistic x. We now define the statistic on which our first test of 
permutation symmetry is based. Let 


(3.1) s= DD of * BieBiy 


i<t’ j<i’ 


x; is thus the quadratic form associated with the inverse of the covariance matrix 
of 8, for a particular choice of the coordinates of 8. This particular choice is only 
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TABLE I 
Distribution of xi for k = 3;n = 2, 8, 4, 6,6 


Pr (x} S 2) x | PrGds =) | Pr (x} S 2) 
| 





9 


= 


167 
833 
-000 


NINOS & We © 
caocoaon 


— 


a notational convenience; the same statistic is obtained with any other admissible 
choice, (Lemma 1). 

The exact distribution of xi, for n finite, can be computed from the distribution 
of 6 as given in Theorem I. This is an arduous process, except when k and n are 
both small. A few values are given in Table I, for k = 3. 

The asymptotic distributions of 8 and xi are given by 

TueroreM III: The vector random variate, 8 = {8;;}, has asymptotically a non- 
singular multivariate normal distribution with density function Ce t*, The statistic 
x; is asymptotically distributed as Chi-square with C2 degrees of freedom. 

Proor: 8 is the standardized sum of n identically and independently dis- 
tributed vector random variates. Since all second moments are finite, the simplest 
conditions for the central limit theorem in its multivariate form, [4], are satisfied. 
Therefore, as n increases, the distribution of 8 tends to the multivariate normal. 
The covariance matrix is non-singular and independent of n; hence, the density 
function exists. 

It is well-known that the exponent of the density function is distributed as 
Chi-square, [5]. Hence, x is asymptotically Chi-square with C2 degrees of 
freedom. 

The null hypothesis should be rejected whenever x? is large. 

An indication of the accuracy of the asymptotic approximation is given in 
Table II for k = 3 and small n. 
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TABLE II 
Comparison of X{ with approximating Chi-square, Xj, with $ d.f. (k = 3) 
X 2 Pr(X? > x) Pr(X} 2 x) x Pr(Xt > x) Pr(X? > x) 
n=3 n= 65 
9.0 028 029 | 10.2 .0162 .0170 
5.0 .361 .172* 7.8 .0471 .0504 
6.6 1088 0859* 
n=4 n=6 
12.0 .0046 .0074 9.0 .0315 .0295 
7.5 .0807 .0576* 8.0 .0400 .0461 
6.0 . 1343 -1117 7.0 .1017 .0720* 


* Only in the cases marked by an asterisk is the approximation improved by a con- 
tinuity correction. 


4. Friedman’s rank test. There exists a hierarchy of tests of permutation 
symmetry, one of which is the X7-test of the previous section. Another such test, 
lying at one end of the chain, is Friedman’s rank test [1]. 

In the rank test, the k coordinates of an observation are ranked in increasing 
order of magnitude. If r;, denotes the rank of coordinate X; , in the vth observa- 
tion, then r;, — 1 = no. of coordinates less than X;. Let 


(4.1) gate (>. - Et). 


TL y= 


Friedman proposed the statistic 


(4.2) fa wg 

‘ " kk+t)Dm" 
for testing the null hypothesis. Large values of x? lead to rejection of the hy- 
pothesis. 

Friedman tabulated the exact distribution of x? for small n and k, and gave 
a proof that x? is asymptotically distributed as Chi-square with (k — 1) degrees 
of freedom. 

We note that 


(4.3) Bi = 7 > Bd. 


Hence, 


et 3 2 2 
(44) © REED ES] % 0 | 
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and 
(4.5) xi = 3 y Bis — kxe 
(<j) 
We also define 
(4.6) XA = Xt — XxF- 


From Fisher’s Lemma [6], we can deduce that x4 is asymptotically Chi-square 
with C? — (k — 1) = Cy" degrees of freedom, and independent of x?. 
Friedman also showed that, for finite n, 


(48) Ski "wee aa 2(k — 1). 


Similarly, we obtain, by direct calculation, 


Exi= Ci Varxt = "—* kk - 1) 
(4.9) 
Bx = OF" Var xi = "—* (kb - 1)(k - 2). 

For each of these statistics, the mean is independent of n, and the variance is 
an increasing function of n. This strongly suggests that the use of the asymptotic 
distribution, (without a continuity correction), for defining the critical region 
will lead to errors in the so-called “‘safe’’ direction; i.e., the true size of the criti- 
cal region will be smaller than the significance level. The computations carried out 
by Friedman, and by the writer, support this. 

x; and xi provides tests of the same null hypothesis. In most practical ap- 
plications, the alternatives of interest—e.g., one or more coordinates tending to 
be consistently higher than the remainder—can be distinguished by either test. 
In the writer’s opinion, x? is usually the preferred test, and x? should be reserved 
for special situations. An experimenter may, on occasion, wish to make two tests 
based on x? and x4. These two tests are asymptotically independent and are 
sensitive to two distinct classes of alternatives. The classification of alternatives 
is discussed in Section 9. 


5. First extension: arbitrary null hypothesis. Although the argument has been 
presented with one particular hypothesis as null hypothesis—viz., all orderings 
of the {X,} equally likely—it could just as easily be carried through with an 
asymmetric null hypothesis: 


(5.1) Pr{ X;, > Xi, a oe X i,} = Distg---t, >0 


where Pj, i,---« 18 a set of k! positive numbers which sum to unity. 
As a notational convenience, we introduce symbols representing sums of the 
constants in the null hypothesis. 
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(5.2) Disig---t, = Pr{ Xs, > Xig > --+ > Xi}, 2sam<k. 


Random variables Y{} are defined as before, (2.2), and the standardized 
variates, 8;; , are defined similarly by 


(5.3) By = - momma (Yi — ps) 
Vnpil — pij) =! 
The vector variate, 8, is defined by C coordinates, 8;; , which satisfy no linear 
relation. 

Let civ 57° = Covar{Biv , 8;;}. As before, Var(6;;,) = 1, and when i, 7’, j, 7’ 
are all distinct, o;,,;;, = 0. However, the covariance of two coordinates with 
one subscript in common is more complex; e.g. 


a ee ae ee —a 
fai — Pre) (1 = Ps) (1 nil Pr2) Ps 
- A oS 2 oS a Saige es . 
i 712,13 V Di Das Pies + Prise) pull <n Psi2 
(5.4) ; 
/Dr2 ] — pis) a = ee 
+ 23 ‘ 
“HW (1 — prz)pis — V7 (1 — pw)(1 — pis) * (Pan + Pan) 


However, all variances and covariances are finite, and the central limit theorem 
still applies; therefore, 8 has asymptotically a multivariate normal distribution. 

It will be shown (Theorem IV) that the rank of the matrix (¢;;,;;) of order 
C’ , is also C$ ; hence, its inverse (o*"’'”’) exists. We can therefore define 


(5.5) = 2, 2» oii BB 550. 


wei’ ) ei" ) 


Q: is the exponent of the density function of the asymptotic distribution and 
we deduce, as before, that Q. is asymptotically Chi-square with C? degrees of 
freedom. 

To test the null hypothesis, reject when Q, is large. The critical region can be 
determined easily, and approximately, with the asymptotic distribution. 


6. Second extension: the Q,,-statistics. Instead of comparing the random vari- 
ables, {X;}, two at a time, we could compare them m at a time (2 S m & k). 
We define new random variables of order m, by 


(6.1) Bat, MIHILY > XK >--- > XL 
= 0 otherwise 
where (71, i2, -** , tm) is a subset of the first k positive integers. 
There are k(k — 1) --- (k — m+ 1) = @,,, say, such random variables of 
order m. 
If n observations are made, then a Z§°),..-im = the number of times in 
which the ordering X;, > X;, > --- > X;,, occurs. 


We define the standardized i len variate of order m by 


z (a) 
62 ce , Zz... SA. 
- ' _ ~ -VnDiniaeinl iat — = ies = titel 2 ' " — 





TESTS OF PERMUTATION SYMMETRY 1013 


For any fixed set of integers (1:72 --- 7»), the sum, over all permutations of the 
set, of the Z$%},...;, , is unity. 

The standardized random variates of order m satisfy r, = C™» independent 
linear relations, (m = 2). 

When m = 2, yi,i. = 84,4, a8 defined in Section 5. 

When m = 1, we extend the definition by defining 

vy; = standardized mean rank 
71 = 1 (not C1, since the ; satisfy only one linear relation). 

For every fixed m, a vector random variate Ym = {¥i,i.---:,} Of dimension 6,, , 
is defined. 

By the central limit theorem, the distribution of Ym 5 approaches the multi- 
variate normal distribution. However, because the coordinates satisfy 7,, linear 
relations the distribution is singular in the full 8,,-space. 

We reduce the space to dimension 0,, — 7», by omitting one of the coordinates 
appearing in each of the 7,, linear relations. (No coordinate appears in more than 
one such relation, so that exactly 7,, coordinates are omitted.) We denote the 
resulting vector random variate of dimension (0, — tm) by ym . It will be shown 
in Theorem IV that the covariance matrix of y, is non-singular and therefore, 
(at least in theory), can be inverted. Hence, we can define the statistic Q,, = the 
quadratic form in ¥;,;,...:,, associated with the inverse of the reduced covariance 
matrix. Q,, is asymptotically Chi-square with (0, — 7») degrees of freedom. 

Lemma 1: For fixed m, and a given simple null hypothesis, Q,, 1s a uniquely de- 
fined function of the original sample. 

Proor: Non-uniqueness could only occur when reducing the 6,,-space to di- 
mension (8, — tm») by different choices of the coordinates to be retained. 

Let Q,, be defined in terms of one set y;,...:, of coordinates, and let Q,, be de- 
fined in terms of a second set, 7;,...:,,, of these coordinates. Using the known 
linear relations, each coordinate of the second set can be expressed as a linear 
combination of coordinates in the first set; thus, Q,, is also a quadratic form in 
the first set of coordinates. We must show, then, that corresponding coefficients 
in the two forms are equal. It is clearly sufficient to consider only the asymptotic 
distributions for, since the coefficients are independent of n, equality of co- 
efficients in the limit implies equality for all n. 

Consider, therefore, two quadratic forms, Q, Q, in the random variates U, , 
U.,---, U, where U’ = {U,, U2, ---, U,} has an s-variate normal distribu- 
tion and Q = U’AU, Q = U’AU, have Chi-square distributions with s degrees 
of freedom. This implies that the forms Q, Q are of full rank and their associated 
matrices A, A are non-singular. 

Therefore, there exist linear transformations 


(6.3) V = CU, V = CU 
transforming Q, Q into sums of squares of s independent standard normal vari- 


ates, where V, V denote s-dimensional column vectors and C, C non-singular 
s X s matrices 
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(6.4) Q = V’V,7 Q = V’V. 


Clearly, V = CCV = PV, say. Since the coordinates of both V and V are 
independent standard normal variates, P is an orthogonal matrix. Thus V’V = 
V’'V and Q = V’V = V’V = @. 

Essential for the validity of the proof, and for the truth of the lemma, is the 
fact that both quadratic forms are of full rank. Otherwise, C and C are singular 


matrices without inverses, and V is not, in general, a linear transform of V. 


7. Rank of Q,,: Expectation of Q,,. We have already used the fact that the 
rank of the quadratic form Q,, is 8x — Tm . We now give a proof of this statement. 

THeoreM IV: Let 7,, be the vector random variate of dimension (0m — tm) defined 
in Section 6 and let A, denote its covariance matrix of rank, rm . Then tm = 9m — Tm: 

LEMMA 2: Tm < Om — Tm tf, and only if, a linear relation holds among the co- 
ordinates of ym with probability one. Proof omitted. 

Lemma 3: Each random variate 7¥;,:,...i,, of order m can be expressed as a linear 
combination of random variates of order (m + 1). 

PROOF: Z;,i,..-i, = 1 whenever the ordering X;, > Xi, > --- > X;,, occurs, 
i.e. whenever one of the orderings (X; > X;, > --- > Xi,,), (Xi, > X; > 
Xi, > «+: > Xi), +++ , (Xa, > +++ > Xi,, > X;) occurs, for fixed j not a 
member of the set (41, t2, °++ , tm). Zizige-sig = Dijiqeerim TH Zigjigeerig Hott 
Z ixig-+-imj » The y-variates are linear combinations of the z-variates of the same 
order. The lemma follows. 

LemMa 4:7, = k! — 1. 

Proor: When m = k there is only one choice for the set of integers tt. --- in . 
Let h index the permutations of this set. By direct calculations we obtain 


a 


ee Ph Ph‘ : 
(1 — pr)(l — dw’) 


We omit the coordinate corresponding to h = k! to obtain A, . 
The determinant of A; can be evaluated directly. 


(7.1) Var(y,) = 1,  Covar (yn, 7%) = 


ki—1 
72 |4:| = [] —@— «0 
73) | Ar | I] (1 — pn) 
since by assumption, p,; ~ 0 
(7.3) om = ki — 1. 


Proor or THEOREM IV: Suppose rm < 0, — tm for some m. 

By Lemma 2 there exists a linear relation among the random variates 7;,i,-.-i, 
of order m represented in A,, . But, by Lemma 3 each of these can be expressed as 
a linear combination of random variates of order (m + 1) represented in Am41. 
Thus there exists a linear relation among the variates of order (m + 1) and 
Pmnat < 9m41 — Tm4i1- By induction, r, < k! — 1. But this contradicts Lemma 4, 
hence rm = On — Tm. 
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The expected value of the statistic, Q,, , for finite n, is given by 
THEOREM V: E{Qn} = On — Tm- 
Proor: We can write 
(7.4) Qm z= - gala, oO ole 
(iyeeeim) Cine dm) 
where 
git imdte dm ee Cofactor of EA Vig--sim Vix--im} in A. 
| Arm | 


and each summation is over the (0, — tm) sets of m integers (4; , t,--- , tm) 
which appear as subscripts of the y;,...:,, - 


E\{Q,.} = >> , > [Cofactor of E{;,..-in*Viy---im}] 


(igeeebm) | Am | (Saeed 


| : EY Vis--in Viy---im} 
An 
jc h oar 


+ °8m) 


8. The Case m = k. Let h index the permutations of (1, 2,--- , k). Let m, 
be the number of times that ordering, h, occurs in a sample of n, and let p, be 
the probability of that ordering. 

Then 


(8.1) _ (th — n-Mr) 


n= een nd A 
* “np — Pr) 
and 


> (1 — phi = eee 
h=1 me tr A Nn Dr 
which is immediately recognizable as Pearson’s Chi-square statistic. 

TuroreM VI: Q = Doki (1 — pa)yi. 

Proor: It has been shown that Q, is asymptotically Chi-square with (k! — 1) 
degrees of freedom, and it is well-known that Pearson’s statistic has the same 
limiting distribution. Regarding one of the y, as a linear combination of the re- 
mainder, both statistics are quadratic forms of rank (k! — 1) in the same (k! — 1) 
variates with the same asymptotic Chi-square distribution. This is exactly the 
situation considered in Lemma 1, (Section 6), and by an identical argument, the 
two statistics are equal. 


a= 2» (1 — pa)yi. 


9. Consistency and the classification of alternatives. To make a symmetry 
test of order m, we compute Q,, and reject the null hypothesis if Q,, is large. But, 
in a particular case, what order should the test be? The answer to this question 
requires a consideration of the alternative hypothesis. Intuitively, the tests of 
low order such as Friedman’s X? , provide a relatively high sensitivity to a small 
class of alternatives, whereas the high order tests give a low sensitivity—thus 
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requiring a large sample—against a large class of alternatives. This concept of 
a classificatior. of alternatives is made more precise by the following definition 
and Theorem VII. 

Definition: An alternative is distinct from the null hypothesis at level m if, under 
the alternative 


EZ izig---in) F Disig---im 


for at least one choice and permutation of the digits (4: , 2, +--+ , im). 

Remark: Since Z;,;,...;,, can be written as a sum of such counter variables of 
higher order, it is immediate that, if an alternative is distinct at level m, it is 
distinct at any level m’ > m. 

TueoremM VII: The symmetry test of order m is consistent against any alternative 
which is distinct at level m, and onli against such alternatives. 

Proor: The null hypothesis is rejected whenever Q,, > c, i.e., whenever the 
point ym = {7;,-.-:,,} lies outside the region, S, defined by 


(9.1) S = [ym | Q. 3S c). 


c is chosen so that the measure of S, computed with the asymptotic distribution 
under the null hypothesis is 1 — a. S is then a fixed, finite region. Under any 
hypothesis the measure of S tends to a specific value, P, as n approaches infinity: 
we wish to show (1) that this value is zero under alternatives distinct at level m, 
and (2) that this value is different from zero under alternatives not distinct at 
level m. 

The measure of S can be written in the form P + n, where P is the measure 
of S under the approximating normal distribution with the same mean and vari- 
ance-covariance matrix as the given distribution, and 7 is a correction term which 
approaches zero as n approaches infinity. The variances and covariances of ym 
are independent of n; and the mean of y,, , and of the approximating normal var- 
iate, has coordinates 


9 I fie cat alte seaaisoeesae Se Ae 
92) Ebtigeiad = 4/ oa LBZ) = Piri 
Under any alternative not distinct at level m, these coordinates are all zero: the 
approximating normal distribution, and therefore P, is independent of n. Clearly, 
P # 0, thus establishing the second part of the theorem. 

Under an alternative distinct at level m, however, the distance, d, from the 
mean of the distribution to the origin given by 


2 


2 [E{Z; seal } =. Ds. 0004 } 
(9.3) d=n tn ae 0. 
ims Dis---im (1 7% Dis-+-tea) 


Thus the distance from the mean to the origin approaches infinity as n tends to 
infinity. 

It is well-known that the ordinate of a normal distribution tends to zero as the 
distance from the mean increases; hence, the measure, P, of the fixed, bounded 
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set S, under the sequence of approximating normal distributions, tends to zero. 
Since both P and » tend to zero under an alternative distinct at level m, as n 


approaches infinity, the consistency of the test against such alternatives is 
established. 
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CONTRIBUTIONS TO THE THEORY OF RANK ORDER STATISTICS— 
THE ONE-SAMPLE CASE’ 


I. RicHarD SavaGE 
University of Minnesota 
0. Summary. The one-sample problem is considered using techniques de- 
veloped earlier [2], [3]. Let Z = (Z, , --- , Zw) be arandom vector with Z; = 1(0) 
if the 7th smallest in absolute value in a sample of N from the density f(z) is 
positive (negative). Then 


N 
P(Z =z) = N! / cons / IT (f° *(—ys)f**(ys) dyad 
OSv1S-**SuNS@ t=1 
Conditions are found implying P(Z = z) > P(Z = z’) where z is derived from 
2’ by replacing a 0 by a 1, or interchanging a 0 and 1 in z’ by moving the 1 to 
the right. These conditions are met by the normal and other distributions. 
The results are useful in finding good tests of such null hypotheses as 
X,, +--+, Xw are independently and identically distributed symmetrically about 
zero against such alternatives as slippage to the right. The Wilcoxon one sample 
signed rank test is a typical nonparametric procedure used under these con- 
ditions [4]. 


1. Assumptions and notations. Throughout it is assumed that X,,--- , Xw 
are independently and identically distributed random variables with a con- 
tinuous distribution function, F(z, @) having a density function f(z, @). 6 will 
be a real valued parameter and under the null hypothesis Hy: 6 = 0. 


If a, +--+, Zw are the observations and y,, --- , yw are the absolute values 
of the observations arranged from smallest to largest, then z = (2, --- , zw) is 
defined to be the observed rank order where z; = 1 if y; is the absolute value 


of a positive number and z; = 0 if y; is the absolute value of a negative number. 
Thus, n = > 1; 2; is the number of positive observations and m = >-3_,(1 — z;) 
is the number of negative observations. Corresponding to the observed 
y = (w,°*:, yw) and z = (z%,--+-,2y) are the random variables Y = 
(Y1,--:, Yw) andZ = (Z,,---, Zy). There are 2” possible values of Z. For 


a specified value of n there are (*) values of Z. For n fixed the conditional 


distribution of Z is that of the two sample problem [2] where the first popu- 
lation has the c.d_-f. 


F-(z,6) = FO, 6) , S 
\0, z<@ 





> 


, 
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and the second population has c.d.f. 


F(x, 6) — F(0, @) > 9 
F*(2,0)=) 1—F@6) ’ rath 
0, s<@. 
Thus for fixed n the partial order problem is exactly that treated in [2] where 
F(x) = F(x) and G(x) = F*(x). The previous results are not immediately 
applicable, however, since it is not clear how to impose conditions on F(x, @) 
in order to get F(x, 0) and F*(z, @) to satisfy the conditions of [2]. In Section 2, 
the case of fixed n is considered. The notation z’Lz denotes the following rela- 
tionship: z, = z for allk = 1,--- ,N except i andj (i < j) and z; = z; = 0, 
= z; = 1. This notation is also used if there exists z',---, 2’ such that 
z'Lz' --- 2'Lz, «.g., (1010)L(0101) since (1010)L(0110) and (0110)L(0101). 
In Section 3, the partial order of the probabilities of two rank orders having 
different values of n is considered. The notation 2’Sz denotes z = 2 for k = . 
- , N and > holds for at least one value of k. 
The following formula is used repeatedly: 


(1.1) P(Z - z) - wif — YU Uf*(ys, Of" (—y:, 8) dy;] 


The null hypothesis of concera is F(—z, 0) + F(z, 0) = 1, ie., symmetry 


about 0. Under H), P(Z = z) = 2” for each z. An alternative of particular 
interest is 


F(2,0) = / (Qe) OO" ae 6>0. 


All of the following results apply to this alternative hypothesis. 


2. The case of fixed n. 
THEOREM 2.1: 


a) f(z, 0) = u(x)v(o)er" 
b) v(@) 20 

c) u(x) = u(—z) > 0 

d) Ifx < ythena(z) < a(y) 
e) b(6) > 0 


then z'Lz implies A = P(Z =z) — P(Z = 2) > 0. 
Proor: Using (1.1) obtain 


N 
a=Mif = fAlvnwd LU, OF —v,0) dy 
OSsuis-*-SuNS@ t= 
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where 


1 — Lyi» Of(—y , 9) 
i-y » f(y; , 4) 

1 — exp {b(@)[a(y;) — a(—y,;) + a(—y;) — a(y;)]} 
The theorem is proved by showing A(y;, y;) 2 0, which follows since the expo- 
nent is negative due to the monotonicity of a(x). 

THEOREM 2.2: If 

a) f(z, 0) = f(x — 0) = f(@ — x) 

b) Ifx > y and 6 > 6 then, 


f(x,6) f(x, 8) 


A(y:, yi) 


ll 


(v.08) f(y.) | > ° 
c) @>0 
then z2'Lz implies A = P(Z = z) — P(Z = 2’) > 0. 
PROOF: 
( N )rwx 
a= Nf ce | Bs wd} TE en, of —ve, oll] TT aa 
tans sense lear . 


where B(y;, yi) = f(yi, ®)f(—ys, 0) — f(—y;, ®)f(ys, 8) and the proof is 
completed by showing B(y;, y;) > 0. In assumption b let z = y;, y = yi, 
and 6 = —@so that 


f(y, @) f(y; , —8) | 
| | = f(y; — O)f(y; + 0) — fly; + Of(y: — 8). 
‘tn Hao) ee Hus + OF(y 


Now use f(z — 6) = f(@ — x), assumption a, hence 
0 < flys — O)f(yi + 6) — flys + O)f(ys — 8) 
= flys, f(—yi, 0) — f(—yi, OFf(yi, 4) 
= Blyi, yi). 


3. The case of variable n. 

THEOREM 3.1: Under the assumptions of Theorem 2.1, if 2'Sz, then 
A= P(Z=2z) —-P(Z=2) >0. 

Proor. It is sufficient to consider only the special case ze = 2% for all k = 1, 
-++ , N except k = 7 where z; = 1 and z; = 0. Then, 


\ 


(_N 

a=mif --- [ow {LT s(n, OF —ye 0) dlp 

OSui'--Suns@ k=1 ) 

and the proof is completed by showing C(y;) = 1 — f(—y;, 0) X [f(y:, @)J > 0. 

Using the special form of f(x, 6), C(y:) = 1 — exp {b(@)[a(—y;) — a(y,)]}} and 
again the exponent is negative because of the monotonicity of a(y). 
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THEOREM 3.2: If 
a) f(x, 0) = fol(x — 0) = fol 0 — x) 
b) Ifz > y > O then fo(y) > fo(z) 
c) @>0 
then 2'Sz implies A = P(Z = z) — P(Z = 2’) > 0. 
PROOF: 


A =N! / re [ pw A (f**(ye ; orm 0 [ 11 av | 


*sSuns@ 


and it is sufficient to show that D(y:;) = f(y:, 0) — f(—y:, 6) > 0. First, using 
assumption a, D(y;) = fe(yi — 0) — fol —yi — 0) = foly:s — 0) — folys + 0). 
Now if y; > 6 the result follows from b, since y; — @ < y; + 0. If y; < @ the 
result follows from b when we write D(y;) = fe(@ — yi) — fe(ys + @). 

Remark 1. In Theorem 3.2 writing f(z, 6) = fe(x — @) allows f(z, @) not only 
to be translations of the Hp but also other changes, such as changes in scale, 
can occur. 

Remark 2. The assumptions of Theorem 2.2 imply those of Theorem 3.2 but 
not conversely. If in b of Theorem 2.2 we set 6 = 0,20=2x+y,and0<y<z 
we obtain b of Theorem 3.2. The Cauchy density is a counter example of the 
converse. 


4. Some partial orderings. If the assumptions of Theorems 2.1 and/or of 
2.2 and 3.2 hold, then the following diagrams are obtained: 


N= 1 
1—0 

(where P(Z = z) > P(Z = 2) =2z-2) 

N=2 
11 —01— 10 > 00 

3 

111 — 011 — 101 — 110 

\ “ 


001 — 010 — 100 — 000 
4 


1111 — 0111 — 1011 — 1101 — 1110 
| | \ 
0110 


| ¥ ¥ N 
0011 — 0101 1010 — 1100 
‘ Fi | 
| 


¥ 


\ 
0001 — 0010 — 0100 — 1000 — 0000 


1001 
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Now consider the uniform distribution f(z, 6) = lfor@ -}s25 060+} 
and 0 otherwise, 0 S @ S 3}. If n’ is the length of the last run of 1’s in z or n’ = 
the number of the positive observations greater than the maximum of the absolute 
values of the negative observations, then 


(4.1) PZ = 2) => (*) (4 — 0)**(28)! 


To obtain (4.1), begin with 


P(Z = z) = >> P(Z = z|i observations >4 — @) 


t=O 
X P(t observations >4 — @) 
and use 
P(Z = z\|i observations > 3 — 6) = 2°°”, 


P(i observations > }— @) = (*) (20)°(1 — 20)*-*, 


Holding @ fixed, P(Z = z) is an increasing function of n’, and otherwise does 
not depend on z. Thus, the most powerful rank order tests depend solely on n’. 


5. A statistical application. For the normal alternative hypotheses, mentioned 
at the end of Section 1, several test statistics have been proposed: 
a. On intuitive grounds Wilcoxon proposed the statistic 
N 


Tw = 2 2. 

b. Fraser [1] showed the locally most powerful rank order test is of the form 
Ty = >0312:E(Xwi) where Xy; is the ith order statistic from the chi 
distribution with one degree of freedom. 

Both of these statistics are of the form JT = > z,a; where the a; form an 
increasing sequence. It is easily verified that if 2’Lz and/or 2’Sz then 
T(z) > T(z’). Thus statistics of this form take full advantage of the results 
of this paper, i.e., using these statistics the known more probable rank orders 
are put into the critical region first. 


6. Normal slippage. The theorems of Sections 2 and 3 do not help in the order- 
ing of P; = P(Z = (0,0, 1)) and P, = P(Z = (1, 1, 0)), for normal alterna- 
tives. If P; > P, then the partial order for N = 3 given in Section 4 becomes 
the simple order: 


111 — 011 — 101 — 001 — 110 — 010 — 100 — 000 


Tueorem 6.1:° If X,,---, Xw(N = 3) are independently and normally dis- 
tributed, each with mean 6(>0) and variance 1, then A= P(Z=2) — 


2 M. Sobel proved this result for VN = 3 at the 1958 Summer Statistical Institute sponsored 
by the National Science Foundation. 
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P(Z - z’) > O where z and z’ are identical except 1 = zm = 
A=-z2=%4=1. 
Proor: Using (1.1) 


N! Pr ret = N 
a= al - - [{iua.or—w on} [TT au] 
x fexp[— 3 (yi + va + yi + 30) x [Cor — Prt) 


Now make the transformation y; = W,, Ye = Wi + We, Ys = Wi + wr + W;, 
and y; = w;fori = 4, --- , N. The Jacobian is 1 and the region of integration 
becomesO < w; S ws S ws S --: S wy S © fori = 1,2,3; and > 3.1 w, < wm. 
Then 


A= an / otk is, (io [f**(w,, aft —W;, Dy P av, 
x {exp [— 4 (wi + (wi + we)’ + (wy + w2 + ws)’+36)]} 


x [erred an ef 1-9) 


The above integral is equivalent to the following integral, where the region of 
integrationisO S wy, S uw; Sw: Sy 5S *%,0SwS w,,and > 34 Wi SW. 


“x oom | pong ti Abe Ls, of *(—w. 0 [1 av, 
X {exp [— } (wi + (wi + we)? + (wr + we + wy)? + 36] 
— exp [— $ (ws + (ws + we)? + (ws + we + wi)? + 36))}} 


x [eo we) la frm) 


For the region of integration each of the factors in the above integrand is clearly 
>0 except for the { |.Toshow{ | > 0, prove the equivalent inequality 


Ws + (ws + w:)” > wi + (wi + w.)” = w;(w3 + w2) > wi(w, + we) 


which is clearly the case since w; > w, > 0. 
Theorem 6.1 implies a simple order for the five most probable rank orders 
against normal slippage. 
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THE DISTRIBUTION OF A GENERALIZED D7; STATISTIC! 


By Meyer Dwass 


Northwestern University 


1. Introduction and summary. Let F,,(x) be the empirical c.d.f. of n independent 
random variables, each distributed according to the same continuous c.d.f. F(z). 
The major object of this paper is to obtain in explicit form the probability law 
of the random variable 


Di(y7) = sup {Fx(z) — yF(zx)}. 


It is no loss of generality to suppose that F(z) is the c.d.f. of the uniform dis- 
tribution on [0, 1], so this assumption will be held throughout the paper. 

When y = 1, then D?(1) is the usual one-sided goodness of fit statistic whose 
asymptotic distribution was first derived by Smirnov [6]. We obtain in several 
different forms (formulas 2.2 and 2.3) an expression for 


P(Di(y) < a) = P(F.(z) Sa+y2,0 Sz S31). 


Formula (2.2) agrees with the one found by Birnbaum and Tingey [2] when 
y = 1, which is the ‘‘classical’’ case. As a matter of fact, it seems to have been 
overlooked that this formula, for finite n, had already appeared in a paper by 
Smirnov [6]. The new formula (2.3) would seem to involve fewer computations 
for actual numerical evaluation. One rather remarkable fact which results from 
(2.3) is that 


P(F,(z) S$ yz,0S 21) = 5 Se y¥>1 
0, y¥ 21, 


for any n. This was noted by Daniels [4] and was rediscovered by Robbins [5]. 
Using (2.3) it is easy to evaluate lim,.. P(F,(x) S a(n) + yx) where 
y, (y > 1) is fixed and a(n) = d/n, where d is fixed. The limiting distribution 
when y > 1 can be used to derive some facts about the Poisson Process which 
were recently discovered by Baxter and Donsker [1]. 
The methods used are elementary. To assist the reader, the results are all 
listed in Section 2 and Section 3 is devoted to giving proofs. 


2. Statement of results. First a few pieces of notation are introduced. Let 
P,(a,7) = P(F.(z) <a + yz, 0 Sx <1) = P(Di(y) <a), 
Received January 20, 1959; revised April 1, 1959. 
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and let 


atard = (St) 0-(*et-9)) (=). 


For simplicity, whenever it is reasonable to do so, P, and C,(7) are used instead 
of the more complicated symbols. 
It is assumed hereafter that 


(2.1) 0<a<l, a+y>1, 7 > 0, 


for otherwise P, becomes trivially either 0 or 1. 
THEOREM 1: 


k 
(2.2) P,=1— > C,(t), 
t=O 


or, equivalently, 
(2.3) 


where the integer k is defined by 


Remark on Theorem 1: When y = 1, formula (2.2) agrees with the result of 
Birnbaum and Tingey. However, when a is of the order of 1/+/n, (2.3) will 
usually require many fewer values of C,,(7) to compute. For example, for n = 50, 
Table 1 of [2] indicates that a varies roughly between + and }, for those proba- 
bilities ‘‘interesting’’ for statistical applications. Hence the number of C,(7) 
terms to be computed using (2.3) ranges from about 37 to 42 for these a’s, 
whereas using (2.2) the range is from 7 to 12 terms. 

Setting a = 0 in (2.3) yields a 

CoROLLARY TO THEOREM 1 (Daniels [4], Robbins [5]): 

‘ait 
P(F,(2) < yz, 0s281)= 7’ 
0, 
It is interesting that this result does not depend on n. 

THEOREM 2: Let a = d/n, where d is a fixed positive real number, and let y be 

greater than 1. Then 


Naik HN wo 
(2.4) lim P,(d/n, 7) = (1 = ‘) 2 = (=4) e@ i. 
ne y/ Si \ yx 
Remarks on Theorem 2: 


a) The interesting fact here is that when y > 1 the proper norming for a 
requires it to be of the order of 1/n rather than 1/+/n as in the case y = 1. 
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Contrary to what one would expect, the derivation is much more elementary 
when y > 1 than when y = 1. 

b) The right hand side of (2.4) is the same as an expression obtained by 
Baxter and Donsker [1] in connection with the Poisson process. Theorem 2 
immediately gives the same result which is summarized in the following corollary. 

Coro.Luary TO THEOREM 2: Let Y(t),0 S t < © be the Poisson process with 
stationary and independent increments and parameter } > 0, and Y(0) = 0. 
Let y > x, and d be positive. Then 


AVES) 1 r\i amiadin 
PY() <d+m0st< @) =(1-%)PE() «aterm, 


vy] i= t! \y 

3. Proofs. 

I. Proof of Theorem 1, equation (2.2). The basic idea used in the proof is the 
following: let 7; < x2 < --- < 2, be the ordered values of n independent random 


variables, each uniformly distributed over (0, 1). Then, it is well known that 
given z, , the conditional distribution of 


Mf Ze iti, | In-1/Ln 


is that of the ordered values of (n — 1) independent random variables, each 
uniformly distributed over (0, 1). Using this fact it is easy to verify the following 
conditional probability statements: 


P(F,(z) <a+t+yr| 2, = t) 














1, e toSecsws 2-2 ccg1, 
n n l-—a n 1 
=¢ Ps a, yt), if <t and a<——, 
n-—l — 1 Y 
0, Teg tel 





Using the fact that the frequency function of z, is 
a", Bate t, 
0, otherwise, 
we have the basic recursion relationship 


1 
I Piss ( «,—" xt) fa tect 
| Jia n-l én-l n 
7 
(31) Palay) ={ yd 
I nt’ dt = 1 —( —*) ; oe Ces 1. 
Ine 7 n 


Y 








An induction argument can now be applied to prove (2.2). Its truth is trivially 
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true when n = 1. Assume now that it holds for arbitrary n. By this induction 
hypothesis, 


p, (S41 4,2t3 i) =1 a (Att. 2+} 4 i), 
n n n 
where k is defined by k/n S 1 — ((n + 1)/n) a < (k + 1)/n, or equivalently 
by (k+ 1)/(n+ 1) S (1 — a) < (k + 2)/(n + 1). By a routine, but tedious, 
computation which is omitted 
1 
[cn B a BFF yt, n+ 1) Padt = Cnlay,é +1). 


y 


Hence, applying (3.1), it follows that (2.2) is true for n + 1, which completes 
the proof of the first part of Theorem 1. 


II. Proof of Theorem 1, equation (2.3). This follows from 2.2 by means of part 
a) of the following lemma: 
LEMMA: 


a) > C,(i) = 1 


+=0 


b) > (") (A + i)*'(B —i)"** = (A+ B)"/(B—n) 


. 1 os 1 
e) Sy (i) us+oe- *) ~ A —1y(B—-nn+1) 


-[((A+B)"(A+B—n—1) —(B4+1)"(B—n)] 
= vn-j-1 (A +B)" — (B+ 1)" 
d) > (7) 4+ - j) 3298S 


where A + 1, B # n. Part b) isa formula of Abel’s which is referred to in Lemma 
1 of [3]. Part ¢) is proved in [3]. Part d) is proved by writing 


n—l 


oy ah (A + j)(B- 5)" 


= (8+ DR 5 E ”) (A+j(B- j= > (*) (A+p(B-p"™, 


and by then applying b) and c). Part a) now follows from d) as follows. C,(7) 
can be expressed as 


C,(%) = Sut (n —na-l1— (4 sie ayy eo 
(ny)" 


' mf(y+ta-— n 
(ny —n+na+1+ (i —1)) (rts eer, 
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Now let i — 1 = j,ny —-n+na+ 1 = A,n— na — 1 = B, then 


= a 1 vyta-1— n | “\j +\n—1—j 
Ye) = Ai (ts FP )ate-a 
and a) follows from d) by routine algebra. The lemma is completely proved. 
Now part a) of the lemma immediately implies (3.2), given the truth of (3.1). 

III. Proof of Theorem 2. This follows in a routine way from (2.3). 

IV. Proof of corollary to Theorem 2. It is sufficient to suppose that A = 1, the 
general case easily following from this special one. 

Let A(T’) be the event that 


Y(t) d t 


where Y(t)/Y(T) can be defined as 0 if Y(T) = 0. 
According to the well-known relationship between the Poisson process and 
uniformly distributed random variables, 


P(A(T)| Y(T) =n) = P,(d/n, 7), n 


IV 


Hence 


o 





P(A(T)) = YP(A(T) | ¥(1) P= & Pald/njy) 7 


n 


n) 


Since P,(d/n, y) approaches the right side of (2.4) as n — «, and 
since vids e *T"/n!—O0asn— ~ for any fixed r, an easy argument proves that 
P(A(T)) > lim P,(d/n,y) as To. 


Since Y(7')/T converges to 1 with probability 1, it is not hard to show that 
lim P(A(T)) = P(Y(t) S$ d+yt,0 St < ~), 
TT? 


which completes the proof. 
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NULL DISTRIBUTION OF THE HODGES BIVARIATE SIGN 
TEST 


By Jerome Kuorz! 


University of California, Berkeley 


0. Summary. This note presents the solution to the problem of obtaining the 
full null distribution for the bivariate sign test proposed by J. L. Hodges, Jr., 

1955 [1]. The partial solution of [1] is completed, and the table of [1] is ex- 
tended to give the full distribution up to a sample size (n) of 30. In addition a 
partial table is included for sample size from 31 to 50. 


1. Introduction. Using the notation given in [1], the problem is that of count- 
ing the number of cycles having a given value of K. This problem was solved in 
[1] only for the case k < n/3, or n < 3h where h = n — 2k. 


2. Counting the cycles. As stated in [1] the operation of rotation generates 
equivalence classes of cycles. We count the classes by selecting a representative 
member called a pattern. The number of cycles in each class is first determined. 
The total number of cycles for a given k value is then obtained by summing these 
numbers over all patterns corresponding to k. 

To every cycle corresponds a walk in the plane. A plus sign corresponds to a 
step in the y direction, a minus sign to a step in the z direction. Let us call a 
point (x, y) a departure point for a path if it lies on the line y = x (ory = x + h) 
and the path reaches the line y = x + h (y = 2) before returning to the line 
y = x(y = x +h). Thus such points depend upon the given value of h and the 
particular path. Further, let us call the path between consecutive departure 
points a flight. 


3. Specifying the patterns. To every cycle corresponds the particular cycle 
called a pattern which is obtained from the first by rotation and has the follow- 
ing properties: 

(i) The minimum number of minus signs above the diameter (k) is attained 
for this cycle. 

(ii) The cycle starts with a plus and ends the nth step with a plus. 

(iii) The first and hence nth points are departure points. 

To see the existence of such a pattern for a given cycle, let the cycle be rotated 
until Condition (i) is satisfied. Condition (ii) must also be satisfied, otherwise, 
using the fact that diametrically opposed signs are opposite, rotation by one 
would decrease k contrary to the initial rotation. If Condition (iii) is not satis- 
fied, after an even number of steps the path from the first point (0, 0) returns 
to the line y = x. Thus, we can rotate the cycle so that the first point becomes a 
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departure point without changing the number of plus or minus signs up. Condi- 
tions (i), (ii), and (iii) are now satisfied. We see that a pattern defined by the 
three conditions is not necessarily unique. Rotations of patterns taking depar- 
ture points into like departure points may result in a different pattern. However, 
this need not concern us provided we count only one pattern for every class of 
patterns obtained by cyclic permutation. It may be noted that the definition of 
a pattern given here differs from that given in [1]. For example, [1] does not spe- 
cify a pattern for cycles with alternating signs. However, under the restriction 
k < n/3, the two definitions are equivalent. For, under the restriction, a pat- 
tern satisfying the above conditions satisfies those of [1], and the pattern defined 
in [1] is unique. 


4. Counting the cycles for a pattern. To count the number of cycles corre- 
sponding to a particular pattern we show that, to have less than 2n cycles for a 
pattern, the pattern must have an odd number (greater than one) of flights, 
which are all ‘“equivalent.’”’ We specify two flights to be equivalent if they are 
identical, or if one can be obtained from the other by interchanging plus and 
minus signs (e.g., ++—++ is equivalent (~) with ——+—-—). Assume 
that after a rotation of less than 2n steps the pattern repeats itselfi—we take a 
pattern as a starting point for convenience. We must have departure points 
going into departure points of the same kind and hence the rotation consists of 
an even number of flights. Let us denote the flights by a; and suppose there are 
2t + 1 flights in the pattern i = 1, 2,--- , 2 + 1. Next represent the cycle by 
(ay, G2 3 3, O45 °** 5 Cetyr | Gorge; °** , Geer). After a rotation which repeats 
the pattern—say a rotation of 2p flights we obtain 


a Gag “ Gignn “™ °°” 


ae 


where Qn = Q@m(moait+3)- From the symmetry (diagonally opposed signs are 
opposite) we have a; ~ a242, @ ~ G43, °** » Gers1 ™ Oaey2 . Solving the sys- 
tem, we obtain a; ~ az ~ +++ a@er41 +++ . Thus for this case we have 2n/2t + 1 
cycles for the pattern and otherwise 2n cycles. 


5. Counting the patterns. To count the patterns we count them according to 
their number of flights-—one, three, five, etc. For (21 + 1)h S n < (214+ 3)h 
we will have patterns with 1, 3, 5, --- , 21 + 1 flights. For the case of only one 
flight, [1] gives the formula for 2"P[K = k] = 2nm,(n). m,(n) is the number 
of ways of going from (0, 0) to (k, n — k) hitting the line y = x + A only at 
the nth step—the gambler’s ruin problem (see [1]). Generalizing, we obtain the 
formula for the case (21 + 1)h S n < (21 + 3)h, where we have up to 2/1 + 1 
flights 


2"P|K = k] = 2nm,(2) 


; 3 ‘ 3 
+ = I(,°,) > II m(n;) + (3) > Il main) | 


ni<ne<cng i=l Nn i=noFn3 


Ni tnetng=n Ny tnetng=—n 
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4" ("5") #2) ("5") +) | 


21+1 
1) asc Senna EA mm) + (9) 1) ETmioo 


FS es ) 5 Tminy + 
rT) ; re eres Tt-1 
' ze + 2 21+ 1 
2 ), =---=ne] I] a(n | atenemm 2/ cy 1 I(, J. eee ae.) 


, 21+ 1 m(q) 21+ 1 m(q) 
+a(, 74? )( 21 Jet On, tT) J+ 


° ° t ° 
where p is the number of different r; : > tml ry = 3; a4,°-+: , q; r¢ are integers; 


(, bs k ) = n!/k,! --+ k,! is the multinomial coefficient; and J is an indicator 
y° °° At 
function. 

The preceding table completes the table of [1] and gives values of P[K s k] 
to 5D for all values of k and n = 1(1)30. Further the table gives values of P 
to 5D for up to a value which makes P just greater than 10 per cent and 

= 31(1)50. 


6. Acknowledgment. I wish to thank Professor J. L. Hodges, Jr. for suggesting 
a method of attack that led to the solution. 
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EXACT NONPARAMETRIC TESTS FOR RANDOMIZED BLOCKS 


By Joun E. Watsn! 
Lockheed Aircraft Corporation 


1. Summary. A class of nonparametric procedures for testing the statistical 
identity of treatments in randomized block experiments is suggested and dis- 
cussed. The suggested procedures are squarely based on experimental within- 
block randomizations, and they may be chosen so as to have special power 
against particular alternatives. The blocks are assumed to be statistically inde- 
pendent but no assumption is made concerning dependence within the various 
blocks. The basic idea is to obtain from each block a statistic that is, under the 
null hypothesis, symmetrically distributed about zero and then to apply to the 
set of these statistics a nonparametric test of symmetry about zero. The ob- 
servational data can be of any quantitative type. 


2. Introduction. This paper considers experimental designs that are laid out 
in statistically independent blocks. If care is exercised, the blocks can usually be 
separated enough in distance, time, etc., to warrant the assumption of statistical 
independence. 

Within a block, the assignment of the treatments investigated in that block 
can be of either a balanced or an unbalanced nature. For a given design, some 
blocks might be balanced and others unbalanced. The within-block assignments 
of treatments to locations are determined by a set of independent randomization 
processes as follows: the treatments of each block are partitioned into disjoint 
classes, to each class there is assigned a set of eligible locations within the block, 
and the assignments of treatments within a class to their eligible locations, for 
some classes, those of type A, are strictly random (all assignments equally likely), 
possibly dependent from class to class but independent from block to block. A 
block always contains at least one class of type A and each of these contains at 
least two treatments. For the remaining classes, those of type B, assignment to 
location may be random or fixed. The partitioning scheme, which may vary from 
block to block, is selected on the basis of the null hypothesis and the alternative 
hypotheses being investigated. 

The most elementary type of situation considered is that where, for each 
block, the treatments (at least two per block) are not partitioned. Then all the 
classes (one per block) are of type A and the null hypothesis asserts that, for 
each block, the joint distribution of the observations is invariant under all per- 
mutations of the names of treatments within each block. Also, within a block, 
the locations are eligible for all the treatments and are randomly assigned to 
these treatments. 

This elementary situation can be generalized in severa! respects through the 
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use of partitioning. First, various combinations of treatments can be selected 
for comparison within a block. Second, the eligible locations associated with 
the classes of the partition can be chosen in many ways. Finally, the part of 
the null hypothesis that pertains to a given block need not consider all the treat- 
ments of this block. In fact, only the treatments of the partition classes of type 
A are considered. The null hypothesis asserts that the joint distribution of the 
observations for a block is invariant under all permutations of the names of 
treatments within a partition class for all the classes of type A. That is, excluding 
partition classes of type B, under the null hypothesis the treatments within a 
partition class have identical joint probability properties. 

The procedure of including treatments in a block which are not considered 
in the part of the null hypothesis pertaining to this block serves a useful purpose. 
For the situations of this paper, a treatment is included in the experiment for 
one or both of two reasons. First, the question of whether this treatment is identi- 
cal with a specified one or more other treatments can be of interest. This type of 
relation is considered in selection of the null hypothesis Hy . Second, there can be 
interest in a given form of interrelation that might exist between this treatment 
and specified other treatments when the null hypothesis is false. This second type 
of relation is considered in identifying the alternative hypotheses that are of 
principal interest. For each block, those treatments which are included exclu- 
sively for the second reason are placed in ore or more partition classes of type 
B and are not considered in the part of the null hypothesis that is associated with 
this block. The reason for using more than one partition class for these treat- 
ments (e.g., a:separate partition class for each treatment) is that they may not 
have the same set of eligible locations. 

The choice of the eligible locations for the various classes of the partition is 
at the discretion of the experimenter. Often this freedom of choice in specifying 
eligible locations can be exploited to obtain a more efficient experiment. As an 
example, for the treatments of some partition classes, location in one part of 
the block may be more important than location in other parts, because of a 
special condition that exists in this part. There is great freedom in selecting 
eligible locations for treatments, subject to the condition that the size of each 
set of eligible locations is at least as great as the number of treatments that 
could be assigned to this set. In particular, two or more classes of the partition 
might have over-lapping sets of eligible locations. For this case, it is convenient, 
but not necessary, to require that if any two classes of the partition are to have 
at least one eligible location in common they must have the same set of eligible 
locations. Then, when two or more of the partition classes have the same set of 
eligible locations, their treatments can be handled as a group in performing the 
random assignment to eligible locations. This ‘‘grouping and then assigning”’ 
procedure greatly simplifies the random assignment scheme for situations of this 
nature. If desired, a specified location can be assigned to each treatment of a 
partition class of type B. 

To perform the test, a statistic is specified for each block. This statistic de- 
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pends on all the treatments for this block but not on those for any of the other 
blocks. These statistics are chosen so that they have symmetrical distributions 
about zero when the null hypothesis is true. They are also chosen so that the 
test is sensitive to the alternative hypotheses that are emphasized. The forms of 
the statistics can vary from block to block and this freedom can sometimes be 
exploited by tailoring the statistics .o the special situations that exist for the 
blocks. Because of the broad nature of the situations considered, no generally 
applicable rules can be stated for choosing the block statistics so as to emphasize 
the alternative hypotheses of major interest. However, in many cases a reasonable 
selection can be made on an intuitive basis. Some examples of the selection of 
block statistics are given in Section 4. Since the observations are independent 
between blocks, the block statistics are independent and also, under the null 
hypothesis, have symmetrical distributions about zero. Consequently the null 
hypothesis can be tested by use of an appropriate nonparametric test of sym- 
metry about zero. References to some nonparametric tests of symmetry about 
zero are given in Section 3. 

No quantitative attempt is made to evaluate the efficiencies of the tests that 
can be obtained on the basis of this paper. The great freedom allowed in selecting 
the treatment partition classes, the eligible locations, and the block statistics, 
combined with the myriad of possible alternative hypotheses and the different 
kinds of tests that could be used, make such an investigation infeasible. Qualita- 
tive considerations, however, hint that in many cases the efficiency should be 
reasonably high if the eligible assignment locations and the block statistics are 
chosen so that the alternative hypotheses of major interest are emphasized. For 
example, if the normality model for experimental design holds and the treatment 
comparisons are linear, the best test is that based on the appropriate t-statistic. 
A situation of this nature was examined in [1], under conditions that represent a 
special case of the results of this paper. The tests based on the block statistics 
used in [1] were found to have efficiencies that are only in the neighborhood of 
60-70 percent for the case of normality and linear comparisons. However, if 
the most appropriate treatment comparison for the alternative hypotheses of 
interest is not linear, suitable selection of the block statistics so as to emphasize 
these alternative hypotheses may furnish the basis for nonparametric tests that 
are much more efficient than the best t-tests based on linear comparisons. Of 
course, if the block statistics and the eligible locations are poorly chosen, a test 
of this type can have a very low efficiency. 

It is no loss of generality to suppose, for purposes of formal theory, that the 
treatments of a block are possibly different, or at least have different names; 
also to suppose that different blocks can contain different treatments. Situations 
where treatments are replicated or where the same treatments occur in several 
blocks represent special cases of this general situation. 

The null hypothesis of treatment equivalence for specified treatment parti- 
tions can be generalized. Instead of specifying that, for the partition classes 
whose treatments are named in Hj (i.e., class A), the treatments of a partition 
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class have identical probability properties, the null hypothesis could assert 
that this is the case if specified transformations are made of these values. By 
use of transformations, the classes of null hypotheses that are available for con- 
sideration and of alternative hypotheses that are emphasized by elementary form 
block statistics can be greatly extended. 

In Section 3 the permissible forms for the block statistics, verification of their 
properties under the null hypothesis, and a statement of how the test makes 
use of these statistics are presented. Section 4 is titled Block Statistic Selection. 
Two examples are given to illustrate the intuitive selection of block statistics 
so as to emphasize specified alternative hypotheses. 


3. Results. The principal purpose of this section is to show that a properly 
chosen block statistic is symmetrically distributed about zero under H, . Con- 
sequently, all the notation occurring in the derivation applies to an arbitrary 
but specified block. The random method used to assign the treatments of a 
partition to their eligible locations is described in Section 2 and is not 
repeated here. 

Suppose that treatments 1, 2, --- , J occur in the block, and that these are 
partitioned into 7 + 1 classes. The first 7 of these classes are of type A and the 
last of type B. (If there are several partition classes of type B, nothing is lost 
by throwing them together and working conditionally on whatever random 
assignments to location may have been made for such type B classes.) The ‘th 
set contains k(t) — k(t — 1) treatments, with k(0) = 0 and k(T + 1) = J. 
The treatments for the ¢th set are denoted by 

Te(t—1) 41» Meet 42y °° * y Beco) (¢=1,--+,7 +1). 
The partitioning is done so that sets 1, --- , T are the partition classes which 
are used in the part of the null hypothesis pertaining to this block (i.e., class A), 
while the remaining set contains all the treatments that are in partition classes 
which do not appear in H» (class B). In terms of this notation, the null hypothesis 
associated with this block asserts that all the treatments of the tth set have 
identical joint probability properties with respect to the experiment for 
t{=1,---T. 

Let the random variable y(7) represent the observable result for the ith treat- 
ment (¢ = 1,---, J), where the probability effects from the randomization 
and the experimentation are combined to obtain the joint distribution of 
y(1), ---, y(Z). For 1 s t S T, let ¢, denote an arbitrary permutation of the 
numbers 7-141, °** , tee 3 also let @r4, be the identity transformation for 
iecr)an, *** » ty. Use 


Fi yltec—» 44], + Yao lstsTt+)} 


to denote the joint cumulative distribution function (edf) for y(1),---, y(J). 
Then, on the basis of the randomization scheme and the null hypothesis, 


Flylticnul,--:, Yaollsts T+) 
= Flyloi(teey+)), +++ , ylotem) 1; 1 Sts T+ 1. 


(1) 
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That is, under Hy , the joint edf of y(1), --- , y(Z) is invariant under all possible 
permutations within each of the 7 sets of treatments that are considered in the 
null hypothesis. No moments of any order are assumed to exist for the y(7). 

A block statistic is a function of y(1), --- , y(Z) which is denoted by 


Pyltecnal,--:, Yeo); 1 Sts T+ 1. 


This function, which is chosen so as to not be identically zero for all values of 
the y(7), is required to have the property that there exists a set of permuta- 
tions ¢, -*: , dr, dra, Where @r4,; is the identity permutation, such that 


Pylieeyul,-'', ko lstsT+h 


= —glyloi(teeyir)], °°: , yloe(teeml; 1 Sts T+ 1}. 


But, on the basis of relation (1), 


gt ylixce—» 4], ee yltece); lsts T+} 


and 


Jylo(teeayu)),-°:, YWo(km J; 1 StsT+ 


have the same distribution. Thus -g has the same distribution as g if the null 
hypothesis holds; consequently, under Ho , the block statistic g has a probability 
distribution that is symmetrical about zero. 

Since, by hypothesis, the observations are statistically independent between 
blocks, the block statistics are a set of independent random variables with dis- 
tributions that are symmetrical about zero if the null hypothesis is true. A wide 
variety of nonparametric procedures are available for testing the symmetry of 
populations about zero. These include the signed-rank test of Wilcoxon [2], 
[3], [4], [5], the Fisher test [6], Nair’s test [7], a comprehensive set of tests by 
Hemelrijk [8] and by van Eeden and Benard [9], and the results of [10], [11]. If 
the distributions of the block statistics are not all continuous, tests based on the 
assumption of continuity can be validly used by appropriate randomization of 
ties. Alternately, some of the tests are valid for both discrete and continuous 
populations (see, e.g., [6], [7], [8], [9]). 

The efficiency of this testing procedure depends on the test used, the forms of 
the block statistics, the partitioning scheme, and the choice of eligible locations 
for treatments. In particular, the forms of the block statistics have a strong 
influence on which alternative hypotheses are emphasized. The next section 
considers intuitively the problem of choosing the forms of the block statistics so 
as to emphasize specified types of alternative hypotheses. 


4. Block statistic selection. The great freedom in selecting the forms for the 
block statistics allows so many types of situations to arise that no general rule 
for the selection of these statistics seems to be available. The alternative hypothe- 
ses which are eligible for consideration are of such a wide class that determination 
of a general method of selecting a block statistic so as to emphasize an arbitrary 
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but specified alternative hypothesis (hypotheses) does not appear to be feasible. 
However, a reasonable (but not necessarily preferable) selection can often be 
made on the basis of judgment combined with intuitive considerations. Two 
examples are given which illustrate the intuitive method of selecting block 
statistics and which are somewhat typical of situations of practical interest. 
In these examples, the same form is considered to be usable for all blocks. How- 
ever, since the considerations are on the basis of a single block, these considera- 
tions also apply to cases where the forms may change from block to block. 

First example: Let I = 8 and suppose that the null hypothesis asserts that 

treatment 1 is equivalent to treatment 2 and that treatments 3-6 are equivalent. 
Treatments 7 and 8 do not occur in the statement of the null hypothesis. The 
three alternative hypotheses of principal interest are 

H, : The value of treatment 1 tends to be larger than that of treatment 2, but 
small deviations are not important. 

H, : The average of the values of treatments 3 and 4 tends to be smaller than 
the average of the values of treatments 5 and 6. 

H; : The value for treatment 1 minus that for treatment 2 tends to be negative 
and simultaneously the average of the values of treatments 3-8 tends to 
exceed 10. 

If all of H,-H;, hold, or if at least one holds in a strong fashion and neither of 
the one-sided H’s holds strongly in a negative sense, it is highly desirable that 
the null hypothesis be rejected. 

For this case, use of the function 


fyl1] — yl2h* — Hyl3] + yf[4] — 95] — y[6}} 


— ${y[3] + --- + y[8] — 60} sgn {y[l] — y[2}} 


for g, combined with an appropriate one-sided test for symmetry about zero 
(which is sensitive to large positive values of the variable) might be satisfactory. 
The first term accounts for the alternative H, , the second term for H2, and the 
third term for H;. The permutations 


1:12 g2:3<>5 and 4-6 


result in a change of sign for g. 
Second example. Let I = 4. The null hypothesis asserts that all four treat- 
ments are equivalent. The alternative hypothesis of principal interest is 
H, : The sign of the value of treatment 1 minus that of treatment 2 tends 
to be the same as that of the value of treatment 3 minus that of treat- 
ment 4. Also the magnitude of the difference involving treatments 1 and 2 
tends to exceed that of the difference involving treatments 3 and 4. 
For this case, use of the function 


fy{t] — yl2h/{yl3] — yl4]} 


for g, combined with an appropriate one-sided test of symmetry about zero 
(which is sensitive to large positive values of the variables), would appear to 
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be suitable. The permutation 


@:1<2,3<3, and 404 


results in a change of sign for g. 
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A GENERALISATION OF PARTIALLY BALANCED INCOMPLETE 
BLOCK DESIGNS’ 


By B. V. Suan 
University of Bombay 


1. Introduction. The partially balanced incomplete block (PBIB) designs 
were first defined by Bose and Nair [2] in 1939. Later on, in 1942, Nair and Rao 
[7] generalised the original definition to include some confounded factorial de- 
signs as well as many others in the class of PBIB designs. The class of PBIB 
designs was found to include most of the designs used in practice. In 1946, 
Harshberger [4] presented triple rectangular lattices, and Nair [6] proved that 
these designs were not in general PBIB designs, but that the duals of these 
designs were PBIB designs. So it was found that, except for the intra-inter- 
group balanced designs given by Nair and Rao [8], almost all the designs so far 
proposed, with limited number of distinct variances for elementary treatment 
comparisons, were either PBIB designs or duals of PBIB designs. Yet a need 
was felt to find a more general class of designs. In an attempt to find out why 
the PBIB designs with m associate classes have m distinct types of treatment 
comparisons, I came across a more general class of designs, which is given in 
this paper. The arguments which led to this generalisation are also put forward. 


2. Notation. Let there be v treatments, each replicated > times in b blocks of 
k plots each. Let N = [n;;] (¢ = 1, 2, --- ,v;7 = 1,2, --- , b) be the incidence 


matrix of the design, where n;; is equal to the number of times the ith treat- 
ment occurs in the jth block. It is assumed that n;; is 0 or 1. The assumed model 


1s 


(2.1) Yi =~ Bt Bh + E;, 


where y;; is the yield of the plot in the jth block to which the ith treatment is 
applied, » is the general effect, 8; is the effect of the jth block, ¢, is the effect of 
the ith treatment and e;;’s are independent normal variates with mean 0 and 
variance o°. Let T; be the total yield of all the plots having the ith treatment, 
B; be the total yield of all the plots of the jth block and ¢; be a solution for ¢; 
in the normal equations. Further denote the column vectors {7; , T:, --- , To}, 
{B,, Bo, --+ , Bol, {t, to, «++ , te} and {4,, &, --- , t,} by T, B, t and ¢ respec- 
tively. It is well known that the reduced normal equations for the intra-block 
estimates of the treatment contrasts are 


(2.2) Q = Ct, 


where 


Q=T- NB 
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and 
(2.4) Ce) x ; NN’, 


where I(v) is the v X v Identity matrix. The matric C defined in (2.4) will be 
called the C-matrix of the design. Denote by E(m, n) the m X n matrix with 
all its elements equal to 1. 

Lemma 2.1: If the design is connected, the matrix C + aE(v, v) is non-singular, 
where a is any non-zero real number and t = [C + aE(v, v)|"Q is a solution 
of the equation Q = Ct. 

Proor: Let 6;, 62, +--+, 4 be the canonical roots of the C-matrix and let 
1,, lh, «++, 1, be the corresponding canonical vectors. It is well known that the 
C-matrix has one root 0 and that the corresponding canonical vector is 
(v) *E(v, 1); denote these by 6, and 1, respectively. Then 


(2.7) C= > oda: 
t=_2 

and 

(2.8) C + aE(v,v) = >> odd; + avhl. 
+=2 


Since the design is connected, the rank of the C-matrix is v — 1, and therefore 
none of the 6,’s except @, is 0. Hence from (2.8) it follows that the matrix C + 
aE(v, v) is non-singular and 


(2.9) [C + ok(v, 9) = Sag + La 
t—2 6; av 

Also 

(2.10) CIC + aE(v, 0)" = HU = Mv) — ~ E(», v). 
t=—_2 


Hence, since yi Q; = 0, [(C + aE(v, v)]’Q is a solution for t in the equation 
Q = Ct. 

Lemma 2.2: Jf ¢ = AQ is a solution of Q = Ct, then [A + aE(v, v)|Q is 
also a solution of Q = Ct. 


3. PBIB designs. An incomplete block design is said to be partially balanced 
(PBIB) if it satisfies the following conditions (Bose and Shimamoto [3]): 

(i) The experimental material is divided into b blocks of k plots each, different 
treatments being applied to the plots in the same block. 

(ii) There are v treatments each of which occurs in r blocks. 

(iii) There can be established relations of association between any two treat- 
ments satisfying the following requirements: 

(a) Two treatments are either Ist, 2nd, ---, or mth associates. 











GENERALISATION OF PBIB DESIGNS 1043 


(b) Each treatment has exactly n; ith associates (¢ = 1, 2, ---, m). 
(c) Given any two treatments which are ith associates, the number of 
treatments common to the jth associates of the first and the kth associates 
of the second is pj, and is independent of the pair of treatments with which 
we start. Also pjx = pij - 
(iv) Two treatments which are the ith associates occur together in exactly 
Now further define each treatment to be its own Oth associate and the Oth 
associate of no other treatment. We may thus consistently write 


(3.1) M=7r, Mm=1, Pp = Seite, Poe = Pro = dur, 


where 4,; is the Kronecker delta which is defined for all pairs of natural numbers 
i, j, as 65; = 1, if ¢ = 7; and 6;; = 0, if i ¥ 7. Then the relations between the 
parameters are 


bk = or, 2» ni = v, 
t=0 
(3.2) > nid; = rk, = Dir = :, 
i) k= 
NDjx _ n spre 7 MePii ’ i, ds k= 0, l, os 5 
Now consider v(v + 1)/2 treatment pairs (2, 7) (7,7 = 1, 2, ---, v), assum- 
ing that (7, 7) is identical with (j, 7). Partition them inte (m + 1) disjoint 
classes and corresponding to the tth class (t = 0, 1, ---, m), define the v X v 


matrix B, = [Bi,], where Bj; = 1, if the pair (7, 7) belongs to the tth class and 
Bi; = 0 otherwise. The classes can be called the association classes and the 
corresponding matrices, the association matrices. As there is one to one cor- 
respondence between the association classes and matrices defined above, either 
of them will uniquely determine the other. It can be seen that each B, is sym- 
metric. Since every pair must belong to one of the association classes, it is 
obvious that 


(3.3) > B; = E(v, v). 
t=0 

THEOREM 3.1: The necessary and sufficient conditions, that (m + 1) association 
matrices By , B; , ---, Bm determine an association scheme for an m associate class 
PBIB design, are that 
(3.4) Bo = I(v), 
and 
(3.5) BB, = 2 piB; ’ t,x = 0,1, ---,m. 


The proof of the above theorem follows immediately from the definition of 
a PBIB design given by Bose and Shimamoto [3]. 
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The idea of association matrices was also developed by Bose and Mesner [1] 
independently and became available after submission of the manuscript of this 
paper. The reader may note that the concept introduced by Bose and Mesner 
is slightly different from one given here. The idea of association classes and 
matrices as introduced by the former is confined only to PBIB designs, whereas, 
my interest being the generalisation of PBIB designs, the association classes 
and matrices are defined in terms of partitioning of v(v + 1)/2 combinations 
of v objects (an object may occur more than once) taken two at a time, into 
(m + 1) mutually exclusive and exhaustive classes. Theorem 3.1 gives a set of 
necessary and sufficient conditions for such a scheme of partitioning to be an 
association scheme of a PBIB. Lemma 3.1 of Bose and Mesner [1] proves the 
necessary part of the condition; the sufficiency is proved by Lemma 5.1 of [1]. 

Before deriving further results, it is necessary to prove the following matrix 
theorem. 

THEOREM 3.2: Jf A is a v X v positive definite matrix, such that all the non- 
negative integral powers of A are of the form 


(3.6) A* = 2 uvBi, N = 0,1, 2, --: 
where uy; are scalar constants and B; are fixed v X v matrices and A® means I(v), 
then the matrix A’ must also be of the form d,B; , where d; are scalar constants. 

Proor: Let @ be the maximum of the canonical roots of the matrix A. Then 
the canonical roots of the matrix B = I(v) — {1/(@ + 1){A lie within the 
range 0 and 1. Now consider the series 


x 
(3.7) D= > B’. 
N=0 
The above series converges because the series >_ x‘ converges for —1 < x < 1 


and the canonical roots of B lie within the range (Macdufee [5]). Also it can 
be shown that 


(3.8) AD = (6+ 1)I(v) = DA, 
hence 
l 
(3.9 A et annie § 
) 6+ 1 
Now since every power of A is a linear combination of matrices By , B, , ---, B,, , 


the same is true for every power of B and hence D is also a linear combination 
of the matrices By , B,, ---, By. 

CoroLLarRy 3.2.1: If there exist matrices By, B,, ---, Bn, such that I(v), 
E(v, v), and all the positive integral powers of the C-mairix of a connected design 
are linear combinations of the matrices By , B, , ---, B,. , then there exists a solu- 
tion t = AQ of the equation Q = Ct, such that the matrix A is a linear combi- 
nation of the matrices By , B; , ---, Bm , and also 
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(3.10) ao ee ha = ~ E(», »). 


The proof of Corollary 3.2.i follows immediately from Theorem 3.2 and 
Lemma 2.1. 


The C-matrix of a PBIB design can be written in the form 


r(k — 1) — Xi 
k Bo 2 | Bi 


where B, is the association matrix corresponding to the ‘th associate class 
(t = 0,1, ---, m). Using relation (3.5) and mathematical induction, it can be 
proved that all the powers of C are linear combinations of By), B,, ---, Bn, 
also I(v) = By and E(v, v) = > 0 B;. Hence, by Corollary 3.2.1, it follows 
that a solution ¢ = AQ exists, such that 


(3.11) C= 





3.12) A= > dB,. 
t=0 


With a little algebra, it can be shown that the d,’s are the solutions of the 
equations 


m m 


oY Pi sba de ae s. if 1 = 0; 
(3.13) een i 
=--, if 2 = 1,2, --+-,m, 
where 
(3.14) q = “= a=-*, $= 1,2,---,m. 


Since the m + 1 equations in (3.13) are not independent, any m of them can 
be taken and solved with an additional convenient restriction like }> d; = 0, 
or, for some j, d; = 0. It can be verified that the solutions obtained by taking 
d; = 0 will be identical with those obtained by Bose and Nair [2]. 


4. Restrictions on association matrices. 


m 


Lemma 4.1: Jf C = >> cB; and if 


(4.1) BB. + B.B, = 2), gi.B:, 
t=0 
for all x, t = 0,1, ;---, m, then 
(4.2) CY = > uv Bi, 
1=0 


for all positive integral values of N. 
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Proor: The theorem is true for N = 1. Assuming the result to be true for N, 
it can be proved for N + 1 as follows: 
Since a matrix commutes with its powers, 


(4.3) oe a CU a CS. 

Therefore 

(4.4) c**! = 4(C*C + CC”). 

On applying (4.2) for N, (4.4) becomes 

(4.5) Cc’ = 1D Dd uvic;(BB; + B,B,). 
j=0 i= 


Hence substituting for B,B; + B,B; from (4.1), 


(46) cr = > ‘z 2 ncaa} B; . 
t=—0 (t=O j=—0 
Hence by mathematical induction Lemma 4.1 is proved. 

TueoreM 4.1: If the C-matrix of a connected design is C = > Uc,B;, and the 
matrices By , B, , ---, B» are the association matrices of the design satisfying con- 
ditions By = I(v) and B,B, + BB, = 2) Tqi.B;, then the analysis of the design 
will be identical with that of a PBIB design. 

Proor: From Corollary 3.2.1, and Lemma 4.1, it follows that a solution 
t = AQ of Q = Ct exists such that 


(4.7) A = dB; 
and 
(0) — 1 B(v,v) = AC = CA, 
(48) v 
} = (AC + CA). 


Simplifying both the sides in terms of B,’s, we get 


1 m ™m ™m mm 
(4.9) B -— - i. B, = a 2 cats} B, . 

VD t=0 t=—0 (j7=—0 1=0 
Hence, on equating the coefficients of the matrices B,; on both sides of the equa- 
tion, the e,’s are given by a solution of the equations 


~ b qisCi€; = 


] 
i=0 j=0 v 


| 
— 
| 


: ift = 0; 
(4.10) 
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On comparing equations (4.10) and (3.13), they are seen to be identical ex- 
cept for a change of notation. This implies that one can obtain exactly the same 
analysis as that of a PBIB design (Bose and Nair [2]), even if the condition 
(3.5) is replaced by the less stringent condition (4.1). 

The combinatorial implication of the condition (4.1) is the following: If two 
treatments are ith associates, then the number of treatments common between 
the jth associates of the first and the kth associates of the second, plus the 
number of treatments common between the kth associates of the first and the 
jth associates of the second, is equal to 29}, , and is the same for all the pairs 
of treatments which are ith associates. 

Hence the above condition can replace the condition (iiic) of the definition 
of a PBIB design given by Bose and Shimamoto [8], and the analysis of the 
design will remain the same. In the case of two associate classes the two con- 
ditions are equivalent, but in general they are not. 

Example 4.1: Consider the following design with parameters: v = 6, b = 9, 
r=Zk=2m=4,.% =m = = 1, m% = 2,A. = 2,3 = 1, A3 = A 0. 
The plan of the design is given in the Table 4.1 and the association scheme in 
Table 4.2. 

Now consider the treatments 1 and 3. The number of treatments common 
between the Ist associates of 1 and the 2nd associates of 3 is one, whereas there 
is no treatment common between the Ist associates of 3 and the 2nd associates 
of 1. Hence it is clear that this design is not a PBIB as defined by [3], but it 
can be verified that the design satisfies the condition given in (4.1) and that 
some of the qj are 


(4.11) giz = 4 = G23 = Qu. 


One observes that the above example is obtained by taking two X-replications 
and one Y-replication of a 3 X 2 simple rectangular lattice design (Harshberger 
[5]). A similar result will be obtained for any design formed by taking r,; X-repli- 
cations and r, Y-replications (r; ¥ r2) of a p(p — 1) simple rectangular lattice 
design; but, in general, when p > 3, there will be five associate classes. 


5. Further generalisation. From the foregoing arguments, we can see that an 
analysis almost similar to that of a PBIB can be derived from only the assump- 
tions that association matrices satisfy the condition (4.1) and that the C-matrix 
and I(v) are linear combinations of the association matrices. Hence, instead of 
taking By) = I(v), we can think of some association matrices yielding I(v) as 
their linear combination. This will lead to partitioning treatments into several 
groups and finally, to the following definition: 

Definition 5.1: In an incomplete block desigr, partial balance over intra- and 
inter-group treatment comparisons will be achieved, if the following conditions 
are satisfied: 

(i) The experimental material is divided into b blocks of k plots each, different 
treatments being applied to the units in the same block. 
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TABLE 4.1. 
Plan of the design 
Replication | 1 2 | 3 
—|__. —______—__| = —_—— 
Block rt “7s | 4 | 5 | 6 | 7 | 8 9 
a | pate a 
Treatments 1 oP eR aos | 3 5 | 1 2 4 
| 2 & {Otiote th. ¢ 6 | 6 | 3 | 5 
TABLE 4.2. 
Association scheme 
Associates 
Treatment ee . 
ist 2nd 3rd 4th 
i 2 6 4 . % 
2 1 3 5 4, 6 
3 4 2 6 1, -& 
4 3 5 1 2, 6 
5 6 4 2 ai 2 
6 5 1 3 2, 4 
(ii) There are v treatments divided into h groups of m, m2, -+-, m, treat- 


ments respectively; the treatments of the 7th group occur in exactly r,; blocks. 
(iii) There can be established relations of association between any two treat- 
ments satisfying the following requirements: 

(a) A treatment of the ith group and a treatment of the jth group are 
either 77: 1th, 77:2th, ---, or 7j7:m,;th associates (7, 7 = 1, 2, ---, h); aj:tth 
associates are the same as jz: tth associates. 

(b) Each treatment of the 7th group has exactly ;;-i7:tth associates 
(j = 1,2, ---, hk, t = 1,2, ---, mi;) and has zero 1k:tth associates (1 ¥ 7, 
k # it). 

(c) Given any two treatments which are the 77:th associates, the num- 
ber of treatments common to the 7;7;:f;th associates of the first and 7272: toth 
associates of the second plus the number of treatments common to the 
iejetteth associates of the first and 77,:t;th associates of the second is 
2 dij:t(tijith , i2je:t2) and is independent of the pair of the treatments with 
which we start. 

(iv) Two treatments which are 7j:tth associates occur together in exactly 
Xij:t blocks. 

Because of the treatment groupings the condition (iiic) of Definition 5.1 can 
be expressed as follows: 

(d) Given any two treatments which are the 7j:tthe associates (i ~ 7), 
the first belonging to the ith group and the second belonging to the jth 
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group, the number of treatments common to ik:tth associates of the first 
and jk:tth associates of the second is equal to 2 qi;::(tk:t, , jktt) and is 
independent of the pair of treatments with which we start. Also given any | 
two treatments which are the 77:tth associates, the number of treatments 
common to the 7k: t, associates of the first and ik: tsth associates of the second 
plus the number of treatments common to the ik:t:th associates of the first 
and ik:t,th associates of the second is equal to 2 qii::(tk:t , ik:t,) and is 
independent of the pair of treatments with which we start. 
In these designs the total number of associate classes ‘m’ is given by 


(5.5) m= > m;. 
The relations between the parameters are 


h 
Dm = 6, 


t=1 
(5.6) i LD ry inj; 
t=0 te] 
Mik 
>, Qi(tkih , tk:l) = nase, , 
l=1 
Mik 
23° qij-likih , jkil) = nay, 8 éimtg. 


i=1 
Ni; Qij:t(tkil, jkih) - Nin :rQix:1( 27 +t, jkzh), 
ifixj,iXk. 


If B,;.. denotes the association matrix corresponding to the 7j:tth associate 
class, then 


(5.7) B,, j, Bis i te + Bi, ie:teiniy:t = 2>-* Gij:t(hjith ; too: te) Bi; t> 


where >.* denotes the summation over all the possible values of ij:t. 
Also, the C matrix can be written in the form 


(5.8) = >* Cij (Bij:t , 


where 


| 


ijt = r(k — 1)/k, ifi = jandt = 0; 
(5.9) 


= — \Xij:4/k, otherwise. 


Hence, by Lemma 4.1 and Corollary 3.2.1, the solution of the normal equations 
is given by t = AQ where the matrix A is of the form 


(5.10) A = > * dij (Bij: ; 


and the constants d;;., are given by a solution of the equations 
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(5.UL) DO! qag:e(tsjutty , tajette)e:, j4:t; igigsty = 1 — v0", if i = j andt = 0; 


—1 . 
—v otherwise, 


where >.’ represents the summation over all the values of ij::t; and izj2:t, 
Now, from Lemmas 2.1 and 2.2, it can be assumed that a solution, such that 
A is orthogonal to the vector E(v, 1), exists, and then 


h 
(5.10) dito + D> 


Nij:t dijse = 0. 
j=l tal 


Hence, using h equations of (5.10), (m + h) equations of (5.11) can be re- 
duced to m equations in m unknowns. So it seems that the analysis of the designs 
given in Definition 5.1 is similar to that of a PBIB design with m associate 
classes. 

In general, these designs involve a large number of associate classes and 
consequently their analysis is complicated. The minimum number of classes m 
is 3, when h = 2; the analysis for this design is given by Nair and Rao [9]. 

Another simple case is the one for which m;; = 1 and Ajj. = A for all 7 ¥ j. 
In this case the inverse of C + (A/k)E(v, v) can be obtained by working out 
the inverses of h diagonal sub-matrices. Further, if m;; = 1 or 2, the computa- 
tional work will be reduced considerably. 
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THE NON-EXISTENCE OF CERTAIN PBIB DESIGNS 


By Manowar NARHAR VARTAK 
University of Bombay 


1. Introduction. Let N be a Partially Balanced Incomplete Block (PBIB) 


design, (cf. Bose and Shimamoto, [1]), with three associate classes and with 
parameters 


(1.1) v, b, r, k, Ni, iy Diw 3 (1, J, t= 1, 2, 3): 


These parameters are not all independent but they are connected by the equa- 
tions 


3 3 
bk = or; > n =v—-1; > nd; = r(k — 1); 
t=1 


ie fhe Bi — 
Diu = Pui 5 Ni Pju = Nj Diu = Nu Pij 5 


3 
2 Pi 63; (i, ); ae 1, 2, 3); 


where 6;; = 0 or 1 according as i ~ j or i = j respectively. Additional relations 
among the parameters (1.1) can be derived if the association scheme of the v 
treatments of N is completely known. Suppose, for example, that the association 
scheme of the given design N is of the rectangular type; that is, let us suppose 
that 


(1.3) V = V2 (vy, , ve 2 2), 
and that the treatments 6;;(¢ = 1, 2, --- ,v1 ;7 = 1, 2, --+ , v2) of the design NV 
can be arranged in the form of a v; X v2 rectangle 


O11, Or, + °° » Pros 


Oo, , O22, +** 5 Bove 


941 ’ 6o,2 1 4 Box ¥2 


so that the first associates of any treatment 6;; are the other v2 — 1 treatments 
in the ith row; its second associates are the other v,; — 1 treatments in the jth 
column and the remaining (v,; — 1)(v, — 1) treatments are its third associates. 
For the design N with the association scheme (1.4) it then follows that the 
matrices (pj.) are given by 
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ve — 2 0 
(piu) | 0 0 
0 wu—1 (m%—1)(» — 2) 
l 0 0 
(1.5) (pix) =} O m1-2 
7a 1 0 (vu. — 2)(m% — 1) 
0 1 Ve — 2 
(piu) = 1 0 4 —2 


2-2 mn —2 (nm — 2)(m — 2) 
The relevant additional relations among the parameters (1.1) are, in this case, 
(1.6) pu =m —l=m—2; pe =m—1=%7—2; n= nm. 
The parameters r, k, \; , A. and A; are related to v; and v. through the equation 
(1.7) r(k — 1) = (v — 1) + (v — 1)rA2 + (4 — 1)( — 1)Az 


which, in fact, is one of the equations in (1.2) rewritten in the light of (1.6). 

In this paper we shall be concerned with PBIB designs with three associate 
classes whose parameters satisfy the conditions (1.3), (1.5), (1.6) and (1.7) in 
addition to (1.2). We shall call the series of these designs the series A. A design 
belonging to the series A will be said to be symmetric if 


(1.8) v = b, and consequently, r = k. 


It may be noted that the series A includes all PBIB designs with three associate 
classes which are the Kronecker product of two BIB designs (ef. Vartak [2]). 

In the next section we shall show that the conditions (1.2) and (1.6) uniquely 
characterise the association scheme (1.4). We shall then obtain an expression 
for the matrix NN’ for any design belonging to the series A where N is the inci- 
dence matrix of the given design and N’ is the transpose of N. In Section 3 we 
shall calculate the characteristic roots and the determinant | NN’ | of the matrix 
NN’. We shall also calculate there the Hasse-Minkowski invariants, c,(NN’), 
for the matrix NN’ of any design belonging to the series A. 

Some non-existence theorems together with illustrations are given in Section 4. 
These theorems are direct consequences of the results obtained in Sections 2 
and 3, and consist of extensions of the results of Schiitzenberger [3] and 
Shrikhande [4] for symmetrical BIB designs, applicable to the designs of series A. 


2. The uniqueness of the rectangular association scheme. We shall first prove 
the following theorem on the uniqueness: 

THEOREM 2.1: Jf the parameters of a PBIB design N with three associate classes 
satisfy the conditions (1.2) and (1.6), i.e., if the design belongs to the series A, 








NON-EXISTENCE OF PBIB DESIGNS 1053 


then the association scheme for its treatments is uniquely determined and is of the 
rectangular type (1.4). 
Proor: From (1.2) and (1.6) we have, first of all, 


v=mt+m+ngt+1 = (ve —1)+ (ny — 1) + (1 — 1)(m% — 1) +1 = on, 


which is the same as (1.3). Also from (1.2) and (1.6) it follows that the matrices 
(pju) are as given in (1.5). 

Let ¢ and 6 be any two treatments of N which are first associates. 
Let on, -** , din, be the m, first associates of @ and 61, --- , Ain, be the m; first 
associates of @. Then ¢ is one of the 6,,’s and @ is one of the ¢,,’s (¢ = 1, 2, --+ , m). 
Let us say, for the sake of definiteness, that ¢. = @ and 6, = ¢. Now, since by 
(1.6), pin = m — 1 = v2 — 2, it follows that the sets ¢; and 6,; have exactly 
ve — 2 = m — 1 treatments in common. From this and the earlier identifica- 
tions ¢, = 6 and 6, = ¢, it follows that the sets ¢:; and 0; (j = 2,3, --- ,m) 
are identical, i.e., consist of the same treatments. This means that any two 
treatments in the set {@, 0, 02, -°-* , Oin,} (m1 = v2 — 1), of v2 treatments, are 
first associates and that the remaining v, — 2 treatments are first associates 
of each of them. This implies that the relation of being first associates is sym- 
metric as well as transitive for all treatments of the design NV. From this it follows 
that the v = v,v2 treatments of the design N fall into »; groups of v. treatments 
each, such that the relation of being first associates is symmetric as well as 
transitive for the treatments of any of the v, groups. It is, therefore, convenient 
to designate these groups by 


(Ou , O12 , a Pive) 
(B21 , G22 , tien B20.) 


(60,1 ’ ae ye Ou,09): 


The property satisfied by any of these groups is that the first associates of 
any treatment in the group are the remaining treatments in the same group. 

Next, suppose that the second group in (2.1) contains two treatments 6; 
and 6, which are second associates of 6, . This will mean that Pa = 1, which 
contradicts the result p2, = 0 obtained earlier and referred to in (1.5). This 
implies that the second, and in general any of the », — 1 groups after the first, 
cannot contain more than one second associate of @,. But 6, has exactly 
N2 = v; — 1 second associates so that the 2nd, 3rd, --- , v:th group in (2.1) 
must each contain one and only one second associate of 6 . The same holds for 
each of 62, --+, O,. In general, therefore, the 7th group contains one and 
only one second associate of 6; when j # 7. Without any loss of generality, we 
can assume that 62; , 3;, --- , %,;are the m2 = 7, — 1 second associates of 6,; . 

Further, we have p32 = n2 — 1 = », — 2, which, by the same type of argument 
as before, implies that the treatments 4,; , 2; , «++ , 4, are such that the relation 


of being second associates is symmetric as well as transitive for them. The v = vv-2 
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treatments of N, therefore, can be conveniently divided into v. groups of 1 
treatments each, such that the relation of being second associates is symmetric 
as well as transitive for the treatment of any group. 

The two modes of classification of the treatments of N for the relation o 
first and second association can be superimposed by writing the treatments in 
the form of a rectangular array (1.4). 

The third associates of any treatment 6,;; are, then, by exclusion, the nz; = 
(ve — 1)(v, — 1) = nye treatments 6: in the array, where k ¥ i and l # j. 

The relation of association for the treatments of the design N can thus be 
described with the help of the association scheme (1.4), where the treatments 
occurring in the same row as 6,; are its first associates, those occurring in the 
same column as 6;; are its second associates, and the others are its third associates. 
In other words the association scheme is uniquely determined and is of the 
rectangular type. 

This proves the theorem. 

With the help of the association scheme (1.4), we can write down the matrix 
NN’ of the design N belonging to the series A in a very convenient form. Let 
the rows of N correspond to the treatments 0 , 612, °°- , 103, O21, °** Oov, , 

** 5 O.1,°** 5 00,0, respectively, in this order. Then the matrix NN’ is seen to 
have the following structure: 


aa 


we 


| A. Ze 

(2.2) St ee ae 
D. -. MA sak 

where A is a v2 X v2 square matrix given by 

(2.3) A = (r—d)l,, + MED, 

and B is a v2 X ve square matrix given by 

(2.4) B = (dz — As)Ioy + Asks, , 


I,, being the identity matrix of order ». and E,, a square matrix of order v2 with 
all elements equal to 1. Also the matrix NN’, as written in (2.2), has »; rows and 
v; columns. The same result can be summarized in the form of the following 
theorem: 

THEOREM 2.2: The matrix NN’ for a design N belonging to the series A is given by 


(2.5) NN’ = 1,, X (A —B) +E, XB 


where ‘X’ denotes the Kronecker product of matrices and A and B are as defined 
in (2.3) and (2.4). 


3. Characteristic roots, determinant and the Hasse-Minkowski invariants 
of NN’. Let D,, be the ve X v2 square matrix given by 


NON-EXISTENCE OF PBIB DESIGNS 


(3.1) D., = 


It should be observed that the matrix D,, is a modified Helmertz matrix. 
Moreover, the determinant | D,, | of D,, is clearly 


(3.2) | Des | = (—)°* fool}, 


so that D,, is non-singular. In fact D,, is a semi-orthogonal matrix in the sense 
that 


(3.3) D,,Di, = diag{v:, 1.2, 2.3, +--+ , (v2 — 1)0s} 


where diag{a; , a2, °°, Gm} is a diagonal matrix of order m whose diagonal 
elements are a; , @2, -** , @» and off-diagonal elements are all zero. It is easy to 
verify that the matrix D,, reduces both A and B to diagonal forms. Thus 


(3.4) D,,AD,, = diag{velr + (v2 — 1)M], 1.2 (r — a), 2.3(r—M), -*°, 


(v2 — 1)v2(r — r1)} 
and 


(3.5) D,,BD,, = diag{ve\2 + (v2 — 1) As}, 1.2(A2 — As), -**, 


(v2 — 1)v2(A2 — Asz)}. 


It may be noted that, since the elements of D,, are all integral, the equations 
(3.4) and (3.5) can be interpreted to mean that A and B are both rationally 
equivalent to the diagonal forms exhibited on the right sides of (3.4) and (3.5). 

Now consider the matrix 


Dy, 
dine 0 
(3.6) ~ ee 0 


Da, Ds, _ —(v — 1)D,, 
where D,, is the matrix given by (3.1) and H, as written above, has v, rows and 
v;, columns, every 0 in (3.6) being a square null matrix of order v. X v2, . It may 


be noted that the matrix H is the Kronecker product D,, X D,, and hence the 
determinant | H | of H is given by 


(3.7) |H| =| Dz, X Dey | = | De, |"*+| Dog |" = (—)7*(vi!)"*(02!)". 


The characteristic roots of NN’ of (2.2) are the roots of the determinantal 
equation in @: 


(3.8) |NN’ — 61.| = 0 
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where J,(v = v,v2) is the identity matrix of order v. 
From (2.5), we can write this in the form 
(3.9) |\I., X {(A — 61,,) — B} + EL, X B| = 0 
However, it is easy to verify that 
(3.10) H{NN’ — 61,} H’ = H{I,, X ((A — 61,,) — B) + E., K B} H’- 
= diag{y,D,,[(A — 6I.,) + (1. — 1)B\Di,, 1.2 D,,[(A — 61.) — BI 
-Diz,°°* , (1 — 1)mD,,[(A — 01.) — BID.) 


and since D,,AD,, , D,, BD,, and D,,D,, are themselves diagonal matrices, so 
are D,,{(A — 61,,) — B} D,, and D,,{(A — @1.,) + (v1 — 1)B}Di, . Hence 
(3.10) reduces completely to a diagonal matrix. Writing 


6 = rk = r+ (ve — 1) + (1 — 1)do + (um — 1)(r% — 1)As, 
eas ee 
A = r — 2 + (v2 — 1)(i — As), 
6=r—-hu—-kw+A;, 
we find that (3.10) reduces to 
H{NN’' — 61,}H' 


. (3.11) 


= diag} v,v2( % —- 6), V1 1.2(0; —_ 0), ove U1 (Ve — 1 )v2( = 6), 


1.2 v2( 6, — 8), 1.2-1.2(0; — 0), «++ , 1.2(v2 — 1)v2(@; — 8), 
(3.12) 


(v1 — 1)vyve( 02 — 0), (v; — 1)v, 1.2(63 — 8), --- , 
(v%, — 1)vy(ve — 1)v2(8; — 8)}. 
Hence, taking the determinants of both sides, we get 
(3.13) | NN’ — 61, | = (6 — 0)(@: — 0)" (62 — 6)? (0; — a) PO, 


Also the determinant | NN’ | of the matrix NN’ is the product of its char- 
acteristic roots. Hence from (3.13) and (3.11) we get the following theorem: 

THEOREM 3.1: 

(a) The characteristic roots of the matrix NN’ of the design N of the series A 
are 0 , 0, 02, 43; given by (3.11) and their respective multiplicities are 


a = il, “1 =—-%m%—-l=n, a2=uy,-l=n, 


(3.14) 
az = (v; — 1)(ve— 1) = n;. 


(b) The determinant | NN’ | of the matrix NN’ of the design N is given by 
| 1@ 
| NN’ | Pe O05" . git gis 1-1) (ve-1) 
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(3.15) = rr — da + (01 — 1) Qe — Aa) = Ae + (02 — 1) Qa = As) 
{p — Ma — a FAV. 


To derive an expression for the Hasse-Minkowski invariant c,(NN’) of the 
matrix NN’, we note that, from (3.10), 


HNN'H’ = diag{v,D,,[A + (v1: — 1)B\Di, , 1.2 D.,(A — B)Di, , 

2.3 D,,(A — B)Di,, ++: , (v1 — 1)uD,,(A — B)Di,}. 

This can be further written as the direct sum of the matrix 
D,,[A + (v1 — 1) BID), 
and the Kronecker product 
diag{1.2, 2.3, --- , n(v: — 1)} X {D,,(A — B)D,,}. 
That is, we can write 
mn HNN'H’ = o,D,,{A + (v; — 1)B) Di, + diag{1.2, 2.3, --- , n1(01 — 0) 
x {D,,(A — B)Di,}, 


where + denotes the direct sum. 

We now make use of the following results for the c, invariants of the direct 
sum and the Kronecker product of matrices: 

If P and Q are symmetric matrices with rational elements whose c, invariants 
are defined and if 


U=P+QandV=PxQ 
then 


(3.17) cp(U) = (—1, —1)pep(P)ep(Q)(| P|, | Q\)>, 
and 
n(n—1) 
ey(V) = (—1, —1)3*" fep(P)}"{e(Q)}"(| P|, -1)p 2 
(3.18) —— 
(1Q|,-1)>? (JPI,/Q))3"~ 


where m and n are the orders of P and Q respectively, (cf. [5] and [6] respec- 
tively). 

Further we know that if \ is a non-zero rational number and B is ann X n 
matrix whose Hasse-Minkowski invariants are defined, then 


n(n+1) 


(3.19) cp(AB) = e,(B)(A, —1)p 2 (A,| BI) 37 


where | B | is the determinant of B. 
It should be noted that HNN’H’ of (3.16) is rationally equivalent to NN’ and 
is a diagonal matrix. 
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We are now in a position to prove the following theorem: 
THEOREM 3.2: The Hasse-Minkowski invariant c,(NN’) of the matrix NN’ for 
the design N of the series A is given by 


cp(NN’) = (—1, —1)p(6, — V) p( v1 0,01) 32 (v2 Oy ,0s)5)* (Oo, aor rr-» 


x { (A, 62) »( Oe, 93) p(s, 61) »} ewe 
(3.20) v2(v2—1) v1 (v,—1) (v1—1) (vg—1) (01 +02—2) 
X(A,—1)p 7 (62,—1)p * (hs,—1), ’ 
X (1, V2) p( G2, 1) p(Os, 11)5?*(6s, n)3, 
if the characteristic roots 05 , 0; , 92, 0; of the matrix NN’, given by (3.11), are all 
non-zero. 
Proor: Observe, in the first place, that 


(3.21) Dy fA + (v; — 1)B}D), = diag{v.6 , 1.2 6; ,2.3 0, , «++ , ve(v2 — 1) 6}, 
and that 
(3.22) D,,(A — B)D1, = diag{v262 , 1.2 05 , 2.3 03, «-+ , v2(v2 — 1) Os}. 


Hence, when the characteristic roots 60, 0; , 42 , 6; are all non-zero, from (3.16) 
we find that all the leading principal minor determinants of the rationally equiva- 
lent diagonal form of NN’ are different from zero; so that the Hasse-Minkowski 
invariants of this diagonal form and consequently that of the matrix NN’ are 
defined. 

A little algebra shows that 


(3.23) cp{diag(1.2,2.3,---,u(% —1))} = (—1, —1)>,, 


Cp{D,, ((A + (v1 —1 )B] Di,} = (—1, —1) (0, — rm)» 
(3.24) 


v2(ve—1) 


(09 ,01:)37-"(1, v2) p(i,-—1)p 2 »° 
¢p{D,,(A — B)Di,} = (—1, —1)p(62, — 0)» 





(3.25) 


%2(v2q—1) 


(62, 63)32"(03, U2) p(03, —1l)p * * 


Making use of (3.23), (3.24), (3.25) and (3.17), (3.18) and (3.19), it is 
possible to obtain (3.20) after a little calculation. 
This completes the proof of the theorem. 


4. The non-existence theorems with illustrations. Let N be a design of the 
series A characterised by (1.3), (1.5) through (1.7). Let x be any characteristic 
root of NN’ for this design. Then there exists a vector x such that 


(4.1) x NN’x = x 


which shows that x is non-negative. This gives the following theorem: 
THEOREM 4.1: For a design in the series A to exist it is necessary that 
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A=r—ws + (uy — 1)Arc— As) 2 9, 
62 = r — Xe + (ve — 1)(A1 — As) 2 O,7 
6, = r—rAy —Ac+trA3 2 O. 


The following examples illustrate the use of this theorem: 
Example 4.1: Consider the symmetric (v = 6 and hence r = k) PBIB design 
of the series A given by 


v=b= 2%, C m=5, m=3, m= 15, 


Ae = 7, A = 1; 


[4 0 0 ; 00 5 : 014 
(piu) =|0 O 3], (piu) =]O0 2 OF, (piu) =]1 O 2]. 
0 3 12 5 0 10 4 2 


The characteristic roots of NN’ for this design are 
6 = 64, 0 =22, t = 16, & = —2; 


and since @; < 0, Theorem 4.1 is contradicted. Hence the above PBIB is im- 
possible. 
Example 4.2: Consider the PBIB design of the series given by 


vy = 30, b = 20, r= 10, k = 15, m = 4, nm = 5, nz = 20, 
Mi = 10, re = 8, 


3 0 O 00 4 01 8 
0 5 15 4 0 16 3 4 12 


The characteristic roots of NN’ for this design are 
4 = 150, 6, = 25, 6, = 30, 6, = —5; 


and since 6; < 0, Theorem 4.1 is contradicted. Hence the above PBIB design 
is impossible. 
Example 4.3: Consider the PBIB design of the series A given by 


y= 30, b = 50, r = 10, k = 6, nm = 4, ne = 5, n; = 20, 
A = 5, Ae = 6, A; = 0; 


3 0 0 00 4 » ss @ 
0 5 15 4 0 16 3 4 12 


The characteristic roots of NN’ for this design are 
6 = 60, 6, = 35, 6. = 24, a = —1; 


and since @; < 0, Theorem 4.1 is contradicted. Hence the above PBIB design 
is impossible. 
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In the case of a symmetric PBIB design of the series A we have »v = b so that 
the matrix N is a square matrix of order v = v2. The determinant | NN’ | of 
the matrix NN’ must therefore be a perfect square when | N | # 0. This con- 
dition can be formulated in the form of the theorem: 

THEOREM 4.2: A necessary condition for the existence of a symmetric PBIB 
design of the series A when | N | # 0 is that 

(a) if v; is even and v, is odd then 62. = r — 2 + (v2 — 1)(A, — As) ts @ perfect 
square, 

(b) if v2 is even and 2, is odd, then 0. = r — i + (v1 — 1)(A2 — Az) ts a@ perfect 
square, and 

(c) tf v; and v2 are both even then 0,026; , (0; = r — Ai — Ae + Az), ts a perfect 
square. 

The following examples illustrate the application of this theorem: 

Example 4.4: Consider the design given by 


v = b = 66, pm k = 14, % = 2, ™% = 21, nz; = 42, 
a => 7, re = 4, As = 2 
do. ~® Cn «2 Oo Aod 
(piu) =]0 O 21], (piu) =]0 20 O|, (piu) =]1 O 20 
0 21 21 20: =p t 2 @® 


Clearly this design is a symmetric design (v = b) from the series A. Since v; = nz 
+ 1 = 22 is an even integer and v. = nm, + 1 = 3 is an odd integer and since 
6. = r — de + (v2 — 1)(A. — As) = 20 is not a perfect square, it follows from 
Theorem 4.2 that the above PBIB design is impossible. It is easy to verify that 
|N| # 0. 
It may be observed that the parameters of the above PBIB design are ob- 
tained by taking the Kronecker product (cf. [2]) of the BIB designs 
Nir = by == 22, n= ky = 7. AL = 2 
and 
Notts = be = 3, re = ko = 2; y= A. 
of which N, is already known to be non-existent (cf. Shrikhande [4]). 
Example 4.5: Consider the PBIB design given by 


v= b= 48, r= k = 10, % = 7, % = 5, Ms = 35 


A, = 5, Ae = 4, A = 1 


6 0 0 00 7 . 01 6 
(piu) =10 0 S|, (piu) =]0 4 Of, (piu) =]1 0 44, 
0 5 30 7 0 28 6 4 24 


which is a symmetric (b = v) design from the series A. Here both »; and v: are. 
even and the characteristic roots of NN’ for this design are 6 = 100, 6, = 20, 


e 
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6. = 34, 6, = 2. This implies that | N | # 0. Moreover, @,6.4; = 1360 is not a 
perfect square. It follows therefore from Theorem 4.2 that the above design 
is impossible. 

The Hasse-Minkowski invariant c,(NN’) obtained in (3.19) gives us another 
non-existence theorem for the symmetric designs of the series A. 

Let N be a symmetric design of the series A with | N | # 0. Then the matrix 
NN’ = B for this design is obviously rationally equivalent to J, , the identity 
matrix of order v = v,v2.. Hence c,(NN’) must be +1 for all odd primes p. If, 
for any design, c,(NN’) = —1 for some odd prime p, then that design will be 
impossible. 

We state this result as the following theorem: 

THEOREM 4.3: If N is a symmetrical design of the series A with | N | # 0, then 
a necessary condition for the design N to exist is that c,(NN’) = +1 for all odd 
primes p. 

The following examples illustrate the use of this theorem: 

Example 4.6: Consider the PBIB design given by 


v=b= 87, r=k= 16, m = 28, 
A = 4, A» = 8, As = 2. 


24 0 6 0 O 28 = 
(piud=10 O 21, (piu) =]O 1 OF, (pju)=]1 O 14. 
0 2 & 28 0 28 ES 


This is evidently a symmetric design from the series A with | N | # 0. Further 
it is easy to verify that c,(NN’) given by (3.19) reduces in this case to (24, 29), ; 
further, for p = 3 this becomes c;(NN’) = (2,3); = (2/3) = —1 where (a/p) 
is the Legendre symbol of a with respect to the prime p. Thus Theorem 4.3 is 
contradicted and therefore the above design is impossible. 

It may be observed that the above design has a set of parameters which could 
be obtained by taking the Kronecker product of the BIB designs 

Nir, = b; = 3, n= ky = 2, A = ce 
and 
Noite => = 29, re = ke = 8, A» = 2 
of which, N2 is proved to be impossible (cf. Shrikhande [4]). 
Example 4.7: Consider the PBIB design given by 


y = b = 63, f= = il, m = &, Nn. = 6, n3; = 48, 


= 4, As = 5, 


— 


7 = © 0 
(piu) =| 0 51, (pju) =]0 
0 6 42 8 
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This is obviously a symmetric design from the series A with | N | # 0. Further 
it is easy to verify that the Hasse-Minkowski invariant c,(NN’) given by (3.19) 
reduces in this case to (30, 7), (30, —1),. For p = 3 this becomes ¢;(NN’) = 
(2, 3)s = (2/3) = —1, where (a/p) is the Legendre symbol of a with respect 
to the prime p. Thus Theorem 4.3 is contradicted and therefore the above PBIB 
design is impossible. 


5. Summary and acknowledgement. Three non-existence theorems are ob- 
tained for the PBIB designs with three associate classes and belonging to a 
certain series called the Series A. The first theorem makes use of the fact that the 
characteristic roots of the matrix NN’ are always non-negative; the second is an 
extension of Schiitzenberger’s result [3] and the third is an extension of Shrik- 
hande’s result [4] for symmetrical BIB designs. 

I wish to express my sincere thanks to Professor M. C. Chakrabarti for his 
kind interest in this work. Also I am indebted to the referee for useful suggestions 
especially in connection with Theorem 3.2. 
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A NECESSARY CONDITION FOR EXISTENCE OF REGULAR AND 
SYMMETRICAL EXPERIMENTAL DESIGNS OF TRIANGULAR 
TYPE, WITH PARTIALLY BALANCED INCOMPLETE BLOCKS' 


By Junstro OGAwa 
University of North Carolina 


A necessary condition for the existence of a symmetrical balanced incomplete 
block (B.1.B.) design in terms of the Hasse-Minkowski p-invariant was obtained 
by S. S. Shrikhande [1]. Similar necessary conditions for regular symmetrical 
group divisible designs and for regular symmetrical L, type designs were obtained 
by R. C. Bose and W. 8S. Connor [2] and 8. 8. Shrikhande [3] respectively. 

The purpose of this note is to give a necessary condition for the existence of a 
regular symmetrical partially balanced incomplete block (P.B.I.B.) design of 
triangular type in terms of the Hasse-Minkowski p-invariant. 


1. A necessary theorem and lemmas. Two symmetric and non-singular ma- 
trices A and B of the same order n with rational elements are said to be rationally 
congruent or congruent in the field of rational numbers, if there exists a non-singular 
and rational matrix C of the same order such that 


(1.1) C'AC = B, 


where C’ stands for the transposed matrix of C [4]. This relation is denoted by 
the symbol 


(1.2) A~B. 


By the very definition of the rational congruence, it will be clear that (i) 
A ~ A (reflexive), (ii) if A ~ B, then B ~ A (symmetric), (iii) if A ~ B and 
B “a C, then A ~ C (transitive), (iv) A ~ A™, and (v) if A ~ B, then A ~ 
-_s 

Hasse’s Theorem [4, 5}. The necessary and sufficient conditions for two positive- 
definite, rational and symmetric matrices A and B of the same order to be ra- 
tionally congruent are that, in the first place, the square-free parts of the deter- 
minants of both matrices are the same, and in the second, the Hasse-Minkowski 
p-invariants of both matrices coincide with each other for all primes p includ- 
INE Dx - 

If we denote the n leading principal minor determinants of A by 


D,, Dz, +++, Daa, D, =|A| 
and let Do = 1, then [4] the Hasse-Minkowski p-invariant of A is given by 
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n—1 
(1.3) CoA) = (-1, -1)e-T] (Diss, —Di)» 
for each prime p, where the symbol (a, b), denotes the extended Hilbert symbo! 
of norm residue [4, 6], which is defined by 


+1, if ax’ + by’ = 1 has a p-adic solution 
(1.4) (a, b)» =| —1 Ss aieeines car 


Now we shall list some useful properties of C,(A) as lemmas. 
Lemma 1.1 [4]: Jf A and B are rational and symmetric and if 





|A oO] 
U=l0 Bi 
then 
(1.5) C,(U) = (—1, —1)2(| A|,| Bl)»Cp(A) C,(B). 
Lemma 1.2 [4]: For ann X n diagonal matrix A, , whose 1, i element is d, 
n(n+1) 
(1.6) Cy(An) = (—1, —1),(-1,d), 


Lemma 1.3. For a (v — 1) X (v — 1) diagonal matrix U, whose i, i element is 
(v—it+1) (v— 2), 
(1.7) C,(U) = (—1, —1)>. 
LemMA 1.4 [4]: 


n(n+1) 


(1.8) C,(pA) = (—1,p)p ? (p,|A])27C,(A). 
Lemma 1.5: If the n — 1 rational vectors 
ie, *** Me 
of dimensionality n are linearly independent and are orthogonal to 
1’ = (11---1), 


then the Gramian of the set, 7.e., 


i oy 


' 
-|| Qo °° An || 


has the p-invariant C,(U) = (—1, —1),. 

Lemma 1.6: So long as we restrict ourselves to rational vectors, the p-invariant of 
a vector set, i.e., the p-invariant of the Gramian of the set is uniquely determined 
by the linear subspace generated by the vectors of the set. 

Lemma 1.7: For a matrix A of the form 


A = el, + fG,, 
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where I,, is the unit matrix of order n and G,, ts the n XK n matrix whose elements 
are all unity, 


n(n—1) 


(19) ©,(A) = (-1,-1),(-1,e)p 2 (—1,9)p(n,g)p(n, e)p(g, e)3~ 


where we have put 


, 


(1.10) g=e+ nf. 


Next we shall summarize the necessary properties of Hilbert’s symbol [4, 6] 
and some of the fundamental properties of the Legendre symbol (a/p) of the 
quadratic residue [6]. 

First of all, from the definition of (a, b), , it is clear that 


(1.11) (a, b)p = (6, a)p, 
and for any rational numbers ¢ and u, 
(1.12) (at’, bu’), = (a, b)p. 


Hence in any calculation handling the Hilbert symbol, the square part of any 
rational number “an be replaced by 1. 


(a, —a), = +1 
(1.13) (a,a), = (—1, a), 

(a, bibe) » == (a, bi) p(a, be)» 
and [2, 4] 


(1.14) (a,b), = (—ab,a+ b),. 
As a special case of (1.14), we have for every positive integer n: 
(1.15) (n,n +1), = (—-1,n + 1),. 
(1.16) If (ab,p) =1, then (a,b), = +1. 
(1.17) For an odd prime p, (p, a), = (a/p). 
For the even prime 2, we have 
pt ¢-t 
(ig) (M92=(-1)* °, 2,p)2 = (2/p), (-—1,p)2= (—1/p), 


(—1,2)2 = +1, (—1,-1). = —1. 
And for p = ~, we have 
(1.19) (p, qe = (-1, le = (2, Dx 
=(-1l,p)e= +1, (-1,-1).= —1. 


In the above and hereafter, p and q denote odd primes. 
For the Legendre symbol, the following properties are fundamental: 








1066 JUNJIRO OGAWA 


(1.20) (a/p) = (b/p) if a = b(mod p), 
(1.21) (ab/p) = (a/p)(b/p) 


and the reciprocity law [6] 


: ()(6) = ca" 


supplemented by 


(1.23) (=") ey. (?) a (2: 
298 


2. A P.B.I.B. design of triangular type. Triangular association is defined as 
follows: The number of elements is v = n(n — 1)/2, where n is a positive integer. 
We take an n X n square, and fill the n(n — 1)/2 positions above the main diag- 
onal by the different elements, taken in order. The positions in the main diagonal 
are left blank, while the positions below the main diagonal are filled so that the 
scheme is symmetrical with respect to the main diagonal. Two elements in the 
same column are lst associates, whereas two elements which do not occur in the 
same column are 2nd associates. 

In this association each element has n; ith associates, where 


(n — 2)(n — 3) 
~— a 


m = 2n — 4, 


The parameters of association are as follows: 


— 3)(n—4 
pu=n-2, pr=n—-3=pn, P2= ae , 


2 2 n—4)(n-—5 
pu = 4, pz =n—-8=pn, = 2-0 a 
Let the association matrices be Ay = J, , A; , Az, then it is known that these 
matrices generate a commutative linear associative algebra x of rank 3, and the 
regular representation (x) is given [7] by 


Ay > Is, 
| 0 1 0 
Ai> 0, = |2n-4 n—-2 4 |, 
(21) (x): | oO n—3 2n—8| 
0 0 1 | 
aes 0 n—3 (n-—8) | 


, ~ || (n —2)(n — 8) (n —3)(n — 4) (n—4)(n —5) ||’ 
KEL Re AT Melos areas A el 
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This regular representation (x) decomposes into three non-equivalent linear 
representations 


xo: Ap > 1, Ai —> m = 2n — 4, A2— M2 = (n — 2)(n — 3)/2, 
(2.2) :Ao— 1, A, > n — 4, A,— —(n — 3), 
ue: Ag 1, Aim —2, A. 1, 


having respective multiplicities 
(2.3) a = i aqn=-nrn— 1, a = n(n — 3)/2 


in the algebra x. 

Suppose that we are given v treatments having triangular association among 
them and b blocks each having k experimental units in such a way that 

(1) each block contains k different treatments, 

(2) each treatment occurs in r blocks, and 

(3) any two treatments occur together in A; blocks, if they are ith associates. 
This design is called a P.B.1.B. design of triangular type. 

If the incidence matrix of this design is denoted by N, it is also well known 
({7], (8]) that 


(2.4) NN' = rAg + AA, + A2Ae ° 


Hence NN’ has eigenvalues 


r+ (2n — 4) + 


i (n — a)Kn — 3) ‘ 


28) pa = r+ (n — 4). — (mn — 3)Ar2, 
po = 7 — 2. + ra, 
with multiplicities 1, (n — 1) and n(n — 3)/2 respectively. 

It can be shown from the elements of linear associative algebra [9] that there 
exist three mutually orthogonal and symmetric idempotents Aj = (1/v)G,, 
A}, and A? with respective ranks 1, n — 1, and n(n — 3)/2, such that 
(2.6) NN’ = poAo + piAi + pod. 


The column vectors of A; generate the eigenspace of NN’ corresponding to the 
eigenvalue p; . Let us assume, without any loss of generality, that 


O* 1* 1* 2* 2% 
@, , @2 ,°°* , An , Anz1, *°* » Ay 
are linearly independent, and let us put 
O* 1% 1* 2% o* 
(2.7) S = |la: a2 «+ @n Angi +’ a |i, 


then S is a non- re v X v matrix with rational elements. Further let 


Qe’ 1) 


| a” || n+ 





(28) Q= 


Ba {| a2* wai, , rE and Q: = I ans: --al* ll, 








I: 
an | Be 
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then from (2.6) it follows that 


|par O 0 | 
S'NN'S = | 0 aQd: oO |, 
\ 0 0 p2 Qe |, 
or 
re @ 9) 
! v } 
(2.9 NN' ~ 
' 0 rg 0 | 
0 0 p2 Qe 
Since 
eg 
v 
S'S = | 
"=o Q Ol 
00 & 
we get 
(2.10) v|Q: |} Q2| ~ 1. 
It has been shown by Corsten [10] that 
n—1 1 tee 1 
(2.11) 0 Tet toma d ee 2 HL 
o& 1 1 n—l 
hence 
(2.12) 1Qi:| ~ n(n — 2)"". 


3. Necessary conditions for the existence of a regular symmetrical P.B.I.B. 
design of triangular type. In this section, we shall show the non-existence of cer- 
tain regular symmetrical P.B.I.B. designs of triangular type. 

If the design is symmetrical, i.e., v = b and r = k, then the incidence matrix 
N is a square matrix with elements 0 and 1, hence in the regular case | NN’ | 
must be a perfect square. Thus first of all 
(3.1) gant eee = [r + (n- 4), -_ (n _ 3d)" [r _ 2d + we ~ 1 
and then, since NN’ ~ I, , we have 
(3.2) C,(NN’) = (—1, —1)> 
for all primes p. (3.1) and (3.2) are necessary conditions for the existence. 

Now, from (2.9) we get 
(3.3) Cpl NN’) = (-1, -1 ) vf —l, v) p(v, aa | Q: | Q: \)p 


(ot | Qi], 02" | Qs |)» Co(oQ1) -Cp(o29r). 
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By Lemina 1.4, 
’ n(n—1) 
(3.4) C (pi Q,) = (—1, pip (pr, | Qa \)370,(Q1) 


n(n—1)(n—2)(n—3) 


(35) C(mQ:) =(—lp)p (pe, | Q2|)3"""C,(Q2). 


Since 





! \ n— n(n—3) 
v|Q: || Q2| ~ 1, pr ph ~w I, 
it follows that 
(3.6) (v, pr ae ” | Qs | Q> \)»p = = (v,v)p = (—1,2)>, 


n—l \ $n(n—3) | n—l n—l 
(pr | Qi], 02"" ” | Qe|), (pr |Q:|, ot a 


(3.7) a 
“57 (pi, Y)p (-1 a ‘(| Q:|, | Q: |), 
and 
(pr, | Qe er = (p2, | Q2|)p(—r,¥| Qi ie 
(3.8) 


( pe ’ | Q2 |) p( px ’ v)> (pr ? Q: ne 
Substituting (3.4) to (3.8) into (3.3), we get 
C,(NN’) = (—1, —1)p(o1, 0) 37 —1, 01) 371 QI, | Qe |)» 


n(n— 1) 2 n(n—1)(n—2)(n—3) 
-(- 1, pi)> oy (p., lQ\)> ¢ ~l, p2) p ° 
(pe, | Qe |) (pr, v)> (pr, |Q: > Cy(Q1)Cp(Q2) 
(n—1)(n—-2) n(n— 1)(n— 2)(n—3) 
= (—1, —1),(-—1, pi)» . (—1, pe) P (pr, | Qil)» 


(pe, | Qe pl | 0; |, | Qe |)pCp(Qr) C; (Qe), 


whereas by Lemma 1.5 


(| Q: ly | Q: |) pCp(Q1)Cp(Qz2) = +1 


and 
Q:|~n(n —2)"", | Qo| ~ 2(m — 1)(n — 2)"", 
therefore 
(n—1)(n—2) : i 
(3.9) C,| NN’) = (—1, —1) p\—1 » Pi) p : (pi, n)p(pr, _— 2)> 


n(m—1)(n—2)(n—8) 
-(—1, p2)» . (p2, 2)» (2,2 — 1), (p2,m — 2), 
Consequently (3.2) becomes 


(n—1)(n—2) n(n—1) (n—2) (n—3) 


(3.10) 0, =(-1,m)> * (pi, 2)p(—r,m — 2)3-*(—1, pa)» 8 
*(p2,2)p(p2,m — 1)9(p2,n — 2)>° = +1 


for all primes p. 
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4. Examples of non-existent P.B.I.B. designs of triangular type. 
(1) n= 7; v = 5b = 2), r= k = 6. Ai = 0, 2 = 3 
a=-6, »=9 
O, = (—1, 6)»(—6, 7)» = (—1, —1)9(—1, 2)(—1, 3) 9(—1, 7) (2, 7) (3, 7) 


—1\ (7 
0 = (5) G)=- 
Hence this design is impossible. 
(2) n=f7; y = 5 = 21, p=k = 10, A = 0, Ae = 9 


A = — 26, P22 = 19 


=) 
s 
Il 


(—1, —26),(—26, 7) (19, 2)»(19, 6)»(—1, 19)» 
(—1, —1)»(—1, 2)p(—1, 13)(—1, 7) (2, 7) p(13, 7)p(19, 3)p(—1, 19)> 


o- @)@)-@)-@)-- 


Hence this design is impossible. 


(3) n= 7; 9 = b = 21, r=uk = 10, A, = 1, x» = 8 


ll 


Aa = — 19, = 16 
0, - (—1, —19),(—19, 7)» = (—1, —1),(-—1, 19),(—1, 7) (19, 7)» 


= @)@)-@)-O-@- 


Hence this design is impossible. 
(4) n= 7; y = 5 = 21, r= k= 10, A = 2, su =7 
Aa = —12, po = 13 
(—1, —12),(—12, 7),(13, 2),(13, 6),(—1, 13), 
(—1, —1)p(—1, 3)9(—1, 7) 9(3, 7)p(13, 3)p(—1, 13), 


—1\ /7\ /13 —l 
a= (3) (9)G)-G)- 
Hence this design is impossible. 
(5) n= 7; y = 5 = 21, r=k = 10, y= 3, Ae = 6 

Aa = —5, po = 10 

O, = (—1, —5)p(—5, 7),(10, 2),(10, 6),(—1, 10), 

es (—f, —1),(5, 7)x(—1, 7)p(2, 7)p(2, 3) (5, 3)p 

O2 = (—1, —1)2(5, 7)2(—1, 7)2(2, 7)2(2, 3)2(5, 3)2 = —1. 


O, 
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Hence this design is impossible. 


(6) n=f7; y = b = 21, r=k = 10, A = 8, Ae 


ll 


pai = 30, p2 = —5 
0, = (-l, 30) (7, 30),(—1, —5),(2, —5),(—5, 6)» 


~—1\ /7\ (—5 ol 
o=(3)9G)-G@)-- 
Hence this design is impossible. 
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OPTIMAL SPACING IN REGRESSION ANALYSIS! 


By H. A. Davip ano Breverty E. ARENS 


Virginia Polytechnic Institute 


1. Introduction and summary. When a response (or dependent) variable y 
can be observed for a continuous range of values of the independent variable x, 
which is at the control of the experimenter, the question arises as to how a given 
number of observations should be spaced. It will be assumed that x is measurable 
without error and that y differs from the true response function f(x) by a random 
term z with mean zero and constant variance o?. We suppose that the aim of the 
experimenter is to estimate f(x), or possibly the mean response f(x), on the basis 
of n observations (x; , y,). 

Various aspects of this problem of optimal spacing have been studied for the 
case where f(x) is known apart from some parameters (see e.g. Elfving [3], 
Chernoff [1], de la Garza [2], and Kiefer and Wolfowitz [8]). However, the func- 
tional form of f(x) is often unknown or only approximately known. In the absence 
of a specific model to the contrary, polynomial approximations to f(x) provide 
a convenient approach. Section 2 deals briefly with the non-statistical case 
« = 0 when the problem of choosing n abscissae in order to approximate to f(x) 
by a polynomial of degree n — 1 reduces to one of optimum interpolation and 
that of integrating f(x) reduces to Gaussian quadrature. For a fuller account of 
this part sée Hildebrand [5] or Kopal [6]. 

If the response contains a random element, a polynomial of degree n — 1 or 
less may be fitted to the n observations by least squares. The error of approxima- 
tion will now be due, in general, both to random error and the use of an incorrect 
approximating function. We confine ourselves to the case of fitting a straight 
line when the true response, while roughly linear, may contain a quadratic com- 
ponent. Two criteria are considered in arriving at the two abscissae resulting in 
an optimal fit. The first of these criteria ((3.2) below) has also been discussed in 
a recent paper by Box and Draper [7] who have extended its use to the case of 
several independent variables. 

It is shown in Section 6 that for x-values symmetrically spaced about the centre 
of the region of interest nothing is gained in fitting a straight line by the use of 
more than two such abscissae. These optimal abscissae are determined in Sections 
3 and 4. 

The emphasis of the present approach is on attaining an optimal straight line 
fit with a small number of observations, rather than on detecting departures from 
linearity. For the latter purpose more than two abscissae would, of course, be 
needed, but the number of observations required may well be uneconomically 
large. In Section 7 comparisons with some other simple spacings are made. 
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As an illustration, consider the calibration of a large number of instruments 
for a range of x in which f(z) is known to be approximately linear. In this case 
adequate accuracy may be attainable by the use of two observations only. If o 
is not negligibly small several observations may be taken at each of two appro- 
priately selected settings, especially if it is much easier to repeat measurements 
at a given setting than to turn to a new one (compare de la Garza [2]). 

An example illustrating the methods proposed is given in Section 8. 


2. Optimal spacing in the absence of random error. We suppose that the region 
of interest of the independent variable is finite and that it has been transformed 
into the closed interval (—1, 1). If gn1(x), a polynomial of degree n — 1, agrees 
with f(x) at the n abscissae 2 , 72, +--+ 2, , and if f(z) has n continuous deriva- 
tives in (—1, 1) the remainder R(x) = f(x) — gns(x) may be expressed as 


(n 
R(z) = x(2)L_@, 
nN: 


where x(x) = (x — x)(2 — 2%) --+ (x — an), and |é| < 1. In order to make 
gn-1(x) a desirable approximating function it is natural to attempt to minimize 
|R(x)| in some sense by an appropriate choice of abscissae. However, depends, 
in general, not only on the abscissae but also on z and the nature of the function 
f(x). It is therefore customary to content oneself with the minimization, in the 
sense chosen, of |r(x)|. If f(x) is a polynomial of degree n, |R(x)| will also be 
minimized, but more generally the minimization of |R(x)| will be only approxi- 
mate (compare [5], Section 9.6). 
We consider the following two alternative requirements: 


1 


(2.1) | x(x) dx = min, 
1 


(2.2) max | #(z)| = min. 
(—1,1) 

The first is a criterion of closest overall fit and gives the abscissae as the n zeros 
of the Legendre polynomial P,(x) of degree n; the second results in abscissae 
which are the zeros of the Tchebysheff polynomial 7',,(2) = cos (n cos” x). 
Corresponding to these two cases we shall speak of Legendre and Tchebysheff 
spacing. Generally, the latter would be regarded as more appropriate in the 
problem of calibration outlined in the introduction. 

Criteria (2.1) and (2.2) may also be given a statistical interpretation. To this 
end we note that (2.2) can be shown (e.g. [5], Section 9.6) to be equivalent to 


1 2 
aw (x) , 
(2.3 ———, dx = min. 
Li (1 — 2°)! 

Suppose gn_1(x) is required for a value of x chosen randomly in (—1, 1). Then, 
clearly, &[{x°(x)] is minimized by (2.1) if x is uniformly distributed in (—1, 1) 
and by (2.2) if cos” x is uniformly distributed in (0, 7). 

A further advantage of the above spacings is that the integral approximation 
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(2.4) [ w(x)f(x) dx = c w(x) gnx(x) dz 


is a Gaussian quadrature formula with weight function w(x) = 1 for Legendre 
spacing and w(x) = (1 — 2’) for Tchebysheff spacing (see e.g. [6], Chapter 
VII). Thus if the integral of f(z) over (—1, 1) is required it is given by 


(2.5) 2 Hif(a) + En, 


where the H, are tabulated weights, the x, are the zeros of the Legendre poly- 
nomial of degree n, and the error of integration EF, is given by 


eS : ( ni)‘ (2n) 


(2n + 1)[(2n)1}* 


The integration formula (2.5), although it uses only n ordinates, is therefore 
of degree of precision 2n — 1, i.e., the integration is exact if f(x) is a polynomial 
of degree 2n — 1 or less. For a general function f(z), (2.5) can be shown to be 
optimal in the sense that the coefficient of f°" (») in (2.6) is smaller than for 
any other integration formula of degree of precision 2n — 1. 


(2.6) (n). 


3. Criteria for optimal spacing in the presence of random error in the observed 
response. We take the observed response to be 


y(z) = f(z) +z 


where f(z) is the true response and z is a variate with zero mean and variance 
o independent of x. As stated in the introduction we shall consider specifically 
the case where f(z) is a quadratic while the fitted curve is a straight line. We 
suppose that 4n observations are taken at each of x; , 22(z, < x2) and that the 
corresponding observed mean responses are 9; , J. The use of more than two 
abscissae is discussed in Section 6. 

The fitted straight line is then 


Y(xz) = & + &(z — @), 


~vhere 
(3.1) & = 9, é = (G2 — hi)/(x2 — 1%). 
For «¢ = 0 we know from Section 2 that taking z,, z, as the zeros of 


P(x) = 4(32° — 1) or of T(x) = 2a” — 1 will minimize respectively 
1 
[ W@ - YO as, 
1 


max | f(x) — Y(z) |. 


(—1,1) 


Of course, in this case we would take n = 2. 
If ¢ ~ 0 it is a natural extension to try to choose 2, , x2 so as to minimize 
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respectively the expected mean square error E given by 


(3.2) E = Sef (f(z) — Y(z)P dz = ; : 8lf(x) — Y(x)) dx 


or the maximum expected squared error 


(3.3) Emax = max &[f(x) — Y(zx)f. 
(-1,1) 


These criteria are equally applicable to the case where f(z) is a polynomial of 
degree p S n while Y(z) is of degree p — 1, there being n locatiors. If f(z) and 
Y(z) are of the same degree, (3.3) reduces to the minimization of the maximum 
variance which has been considered by de la Garza [2], Guest [4], and Kiefer 
and Wolfowitz [8]. 


4. Legendre and Tchebysheff spacing for o + 0. Before obtaining the abscissae 
“1, X2 satisfying (3.2) or (3.3) we consider briefly the effects of using Legendre 
or Tchebysheff spacing when o ~ 0. For the former case it is convenient to ex- 
press f(x) in terms of Legendre polynomials, viz., 


f(x) = & + P,(z) + c2P2(z). 
Then for any two symmetrical locations (—2z,; = 22) we have from (3.1) 
(4.1) &(6o) = co + cP2(t2), 84) = 
and 
(4.2) var & = —, var ¢; = S. cov (é&,¢) = 0. 
Thus, if z2 = 1/+/3, é and é are unbiased estimators of co and c; . In this case 


S[f(2z) — Y(x)] = eP2(x) 


and 
(43) [, 8@) - ¥@)) ae = 0. 


Thus Y(z) may be said to be ‘‘unbiased on the average” as an estimator of f(x). 
Interchanging the integration and expectation signs in (4.3) we see that the 
expected area under Y(z) is equal to the area under f(x), a result which con- 
tinues to be true if f(z) is a cubic, in line with the optimal integration properties 
of Legendre spacing. With Legendre spacing we have also 


[f(x) — ¥(x)) = var é + 2° var & + cP3(zx) 
= fo” + §a°o” + c3P2(x), 
where o” = 2¢’/n, so that the expected mean square error is 


(4.4) E, = o” + tc}. 
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The results (4.1), (4.2) but not (4.3) hold also with obvious changes when 
f(x) is expressed in terms of Tchebysheff polynomials, viz., 
f(x) = bo + iTi(x%) + beT2(z). 
In this case é and é, are unbiased estimators of by and b; if z2 = 1/+/2. 


5. Optimal spacing with two locations. We consider first the minimization of 
E in (3.2) and to this end show that the search for optimal values of x; and x2 
may be confined to the symmetrical spacing —x,; = 22. 

In place of (4.1) and (4.2) we now have 


&(o) = Cy + CF + P(x), &(¢,) =¢c + 302k 


and 
var & = }o” var ¢; = 2a" cov (é&,é) = 0 
0 2 ’ 1 (xe eee a)?” 0, %1 ’ 
where P.(x) = 3[P2(21) + Po(22)). 


It follows that 
E[f(x) — Y(x)] = e[P2(z) — P(x) — 3%(2 — #)] 


and 
9/2 
(51) 8lf(z) — Y(2)F = 4o% + "2 (@ — 2)* $ [8lf(z) — ¥(z)]}*. 
(Xe = 2) 
Hence 
2 
(5.2) BE = 407 + —"__[(1 — #)*' + (1+ 2) + GX, 
3(%2 — 2)? 
where 


X = {4+ P(x) + B@[(1 — 2° + (1 + 2))] — 6P.(x) 2}. 


oe 


Let x2 — 2, = 2a;then |%| S 1 — a. Writing alsoz = y,x,; = y — a, 22 = yt+a, 
we have 


X =4+4 1(3y + 3a” — 1)? + 6y — 9ya’, 


and for any given a this may be shown to have a single minimum at y = 0 pro- 
vided \y| S 1 — a, \a| < 1. Corresponding to any given a, therefore, X and 
hence E are minimized by taking x; = —a, 22 = a. 

From (5.2) we may now write 


72 
(5.2') B = 30" + - + ct + Pi(z2)). 
v9 


This is to be minimized with respect to x2. Setting dE/dx, = 0 we find x to 
be a root of the equation 
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TABLE 1 
Values of —x, = 22, as a function of b = o’/ | ce \, giving (i) 
generalized Legendre and (ii) generalized Tchebysheff spacing 


— 2% 


| 


PR HWWWWNNNHE HER OOOO 
Aw DeOWONE HK BAN ORAW 


| 
| 
| 


N.B. 22 = 1 for b = 4.243 in (i) and b = 2.121 in (ii). 


(5.3) 13(323 — 1) =a, 


where a = o”/(9c3). Thus 22 is a function of a or equivalently, of b = o'/\c2). 
Equation (5.3) is a cubic in x} with only one real root which corresponds to the 
required minimum. For o = 0, (5.3) gives Legendre spacing. On the other hand, 
if ¢ * 0 but c. = 0, so that a is infinite, F will be minimized by making x, as 
large as possible, i.e., z2 = 1. In fact, 23 = 1 fora = 2. Fora > 2 oro” > 18¢c3 
we still take z,. = 1. The dependence of zz on b is shown in Table 1. 

We turn now to the minimization of the maximum expected squared error of 
(3.3). In this case also we may take —2,; = x2. By (5.1) it is therefore required 
to maximize 


722 

, oz 2; 2 2\2 

X' = Oat + fo2.(2° — 22) 
2 


with respect to x and subsequently to minimize this maximum with respect to 
a2. If we regard X’ as a quadratic in x* for 0 < 2’ S 1, it is clear that its maxi- 
mum occurs at x = 0 or 1. For x’ = 0, X’ increases in x2 from 0 to (9/4)c3 
while for z” = 1, X’ decreases from © to }o”. Thus if o” = (9/2)c3, then x2 = 1 
is the solution. Otherwise the solution is that value of zr. between 0 and | which 
equalizes X’ for x’ = Oand 2’ = 1. This occurs for x7: — he} = a, so that for 
optimal spacing 
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(5.4) av = 4{1 + (1 + 16a)*} or 1, 


whichever is smaller. For ¢ = 0, (5.4) gives Tchebysheff spacing The dependence 
of x2 on b in this case is also shown in Table 1. 

The two types of spacing. may conveniently be referred to as generalized 
Legendre and generalized Tchebysheff spacing. 


6. Possible use of more than two locations. Suppose that more than two loca- 
tions are available to us and that we fit a least squares straight line to the n 


observations. If these are taken at x, S a2 S --: S 2, it seems natural to 
continue to assume symmetry of spacing, i.e., 7; = —2naiis+ (4 = 1, 2, ---, n), 
so that both >> x; and >- z? vanish. Then 
(6.1) @=9% & = Law/d zi, 
; Cc P2( 2; , 

(6.2) &(é) = oo + ih . &(4) = a, 

2 2 

— i. o a oC . . 
(6.3) var & = a var = =, cov (é&,&) = 0. 
Zi 


Now comparison of (6.2), (6.3) with (4.1), (4.2) shows that the two sets of 
equations become identical if >> 2? = nz}. In other words, corresponding to 
any symmetrical configuration of n locations, a value x2( > 0) can be found such 
that 3n observations at each of +22 give estimators ¢, ¢, with the same ex- 
pectations, variances and covariances. It follows that the two spacings are 
equivalent from this point of view as well as on the basis of any criteria de- 
pending on the first two moments only, such as (3.2) and (3.3). See also Box and 
Draper [7], who obtain similar results on merely taking >> z; = 0. 

In certain situations it is advantageous to vary the independent variable as 
little as necessary. Apart from its convenience the use of two locations will 
obviously be optimal on this score also. Of course, more than two locations are 
necessary to detect departures from linearity in f(z) but this is not our aim 
here. 

As before, we have taken n even which would be the usual situation. However, 
if n is odd, the number of locations is reducible to three, and an odd number of 
observations has to be taken at = 0. Clearly, the narrowest spacing is given 
by a single observation at z = O and 3(m — 1) observations at each of 
+2x[n/(n — 1)}*. 

These equivalence results may be compared with those obtained by Elfving 
[3] and de la Garza [2] in the case when the fitted function and the true response 
are polynomials of the same degree, so that no bias enters. For c. = 0 the present 
result is a special case of theirs; the equivalence continues to hold for ce # 0 
because by (6.2), (6.3) the bias in é is, like the variance of é¢, , a function of 

Li. 
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TABLE 2 
E and Emax as functions of b = o’, for various spacings 
E 


(i) (ii | (iii) (iv) | (v) 

Generalized Legendre 

Legendre | (—1/+/3, 1/3) (-1, 1) (40.2, 40.6, +1) 
| 


I 
= 
? 
= 
= 


————— 





0.20 
0.54 
1.46 


. 
& 8 
Saeweoe 
SS8AIBS85 


-PWOWN eS re Oo © 
DHNeEOCHKANE 
BUISomweoo 
PLSsBstPss 
Swnonawy 
SSLSLES 

a 


—_— 
— 
—— 
3 % 


Ewax 


1 | 
(i) (ii) (iii) (iv) 
Generalized | Tchebysheff _ 
Tchebysheff (—1/+/2, 1/+/2) | (-1, 1) (—1, 0, 1) (40.2, +0.6, +1) 





0.56 | 2.25 1.00 0.64 
1.10 2.43 1.18 1.21 
2.72 2.97 2.05 2.90 
5.42 3.87 4.30 5.73 
9.20 5.76 7.45 9.69 





7. Comparison of FE and E,,,x for various spacings. It is of interest to compare 
our two optimal spacings with other simple spacings. The results of a number of 
such comparisons are set out in Table 2. For definiteness, and without real loss of 
generality, we have taken the true quadratic response as f(z) = c + P(x) + 
P(x), 80 that b = o’. For various values of o’ Table 2 lists E which from (5.2’) 
and Section 6 is given by 


EB = o”(0.5 + 6y") + (0.45 — 1.57 + 2.257’), 


where y = >. 23/n; and also Emax which is the larger of 0.50’ + 2.257? and 
0.507(1 + y") + 2.25(1 — y)’. 


8. An example. To illustrate Legendre and generalized Legendre spacing we 
suppose that the true law under study is 


(8.1) h(z’) = 8 — 2’ + don”, 
Put 2’ = 5 + 52 to transform this to 
f(z) = 3B — fe + He’, 
= 4A — $Pi(x) + $P2(2). 
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xO x WW 


Fig. 1. Illustrating Legendre and generalized Legendre spacing 


If this function may be observed for only two values of x the closest overall fit in 
the case of no random error (« = 0) is obtained by taking —2, = 22 = 1/+/3, 
which results in the straight line of approximation 

Y(z) = 4A — §z. 
The average error is zero and the mean square error is by (4.4) simply 
2/5 = 5/36. 
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When c # 0 the optimal spacing is given by Table 1 with b = 60’/5. Thus 
for o’ < 4 the spacing is only slightly wider than the Legendre spacing while 
for o’ > 3 the observations should be taken at x = —1, 1. The expected line 
still has slope c, = —5/2 but is displaced upwards through a vertical distance 
CoP 2(2x2). 

For o’ = 1 the situation is shown in Figure 1. In this case z. = 0.725 and 
the expected mean square error E = 1.014 by (5.2’) or from Table 2 (1.46 x 
(5/6)*). This may be compared with E, = 1.139. For o’ = 2, we have x. = 0.855 
and E = 3.298, E, = 4.139. 

In this example we have taken h(2’) as known so that the results could be 
presented graphically. However, it is clear that the optimal locations are de- 
termined completely by the coefficient of x” in (8.1), the specified range of x’ 
and the standard deviation o’. Thus the same results hold approximately when 
all that is known is that the response function is linear in the range (0, 10) apart 
from a quadratic term with coefficient of the order 0.05. 
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A. H. E. GranpaGe anp R. J. Haver 
North Carolina State College 


1. Introduction. This paper considers a problem arising in the design of ex- 
periments for empirically investigating the relationship between a dependent 
and several independent variables, all variables being continuous. It is assumed 
that the form of the functional relationship is unknown but that within the range 
of interest, the function may be represented by a Taylor series expansion of 
moderately low order. Specifically, the problem considered herein is that choice 
of combinations of levels of the independent variables which, a) will enable an 
experimenter to approximate a functional relationship by fitting a Taylor series 
expansion through terms of order 3, by the method of least squares, and b) will 
have the property of rotatability. Such a choice of combinations of levels of the 
independent variables will be called a third order rotatable design. 

For the sake of brevity, the abbreviation dth ORD will be used to denote 
dth order rotatable design. 


2. Rotatability. The property of rotatability as a desirable quality of an experi- 
mental design was first advanced by Box and Hunter in [1]. This property is 
that the variances of estimates of the response made from the least squares 
estimates of the Taylor series are constant on circles, spheres or hyper-spheres 
about the center of the design. Thus, a rotatable design, that is, a design which 
achieves this property, could be rotated through any angle around its center and 
the variances of responses estimated from it would be unchanged. 

Box and Hunter proved that a necessary and sufficient condition for a design 
of order d (d = 1, 2, 3,--- ) to be rotatable is that the moments of the inde- 
pendent variables be the same, through order 2d, as those of a spherical dis- 
tribution, or that these moments be invariant under a rotation of the design 
around its center. 

Let k be the number of independent variables, or factors, and let 


Xiu, Lou, *** » Ley be the levels of these variables for the uth experimental point 
in the factor space, (u = 1, 2, --- , N). Then a pth order moment is defined as 
he 
ND shiris °° the, 
u=l 
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where 0 3 g,0 S17,---,O0St,andg+r---+t#= p. Further, let the in- 
dependent variables be standardized so that 


N 
(2.1) Yc. =N (¢ = 1,2,---,k). 
u=l 
Let , be the expectation of the response at the uth experimental point. For 
a polynomial equation of third order this may be written 
k 


k k 
(2.2) Mum = Bo + 2 bitin + De Bastin iu + 7 Bist Liu L ju Liu 


isi= isisi=1 


or in vector notation as 
(2.3) % = f8. 


(For what is to follow, the order of the terms in (2.3) is different from that in 
(2.2).) If the (N X L) matrix X is defined as 


(2.4) 


where L = ‘ : *) , the number of terms in (2.2), and if X’ is the transpose of 


X, then N~'(X’X) is the moment matrix of a configuration of N points in the 
k-dimensional factor space. 


For the configuration of N points, or the design, to be rotatable, N~'(X’X) 
must satisfy 


i I ee 
MZ Q QO 
Ki OQ 
K, 
(2.5) N~(X'X) = 


(symmetric) Ey 


in which the submatrices are defined as follows: 


2 
v1 
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3 2 2 2 
Ze 36SEC BB Vi Be Ui Le 


LON he ee te Mm 
15\6 3N6 SNe es ONG 








B\e Ne ttt OG 
(2.7) Ki Be bes ~ ; (¢ = 1,2,---,k) 
3X6 
U1 XY Xs Tr-1 Tr 
rr, @ --- 0} 
eer 
(28) MT = | ; ie 
| anal 
L ri J 
Li Le Xs %I%2% °° Tk-2 Th-1 Uk 
rh 0 kei 0 7 
Xe eee 0 
(2.9) A] = | 
sented 
L 3 


The headings at the top of the matrices in (2.6) through (2.9) are intended 
to indicate the form of the elements in the matrices; they are not the vectors 
of (2.4). The reader will note that the arrangement of the moment matrix (2.5) 
is different from the arrangement of the second order moment matrix in [1]. 
(2.5) is written in this form to point out the amount of orthogonality present 
and to facilitate the calculation of the inverse. 

In (2.5), 0 denotes a null matrix of appropriate size and in (2.7) the column 
and row corresponding to x? appears only once and always in the second position. 

The constants, \, and X, , must satisfy the restrictions 


k 
(2.10) M> E35 
ri(k + 2) 
9 li toa Reiilallaaachmate 
(2.11) he'> 77 


if (2.5) is to be positive definite. 
The criterion of rotatability for a third order design is characterized mathe- 
matically by equation (2.5) with its attendant restrictions, equations (2.10) 
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and (2.11). To find a 3rd ORD in k factors, one must discover a set of combina- 
tions of factor levels whose moments are those of equation (2.5). 
The inverses of the submatrices in (2.5) are 
[> c 
d 
| 


eee c} 





xX e® 6 
® 


G'=A 








L al 


Qf 
h 
| 
| 
| 


Siac 
SS sa 





= Bl 


u wi 


(Ml) = T/\q (Ae) = No 


in which 
b=2(k+2)N c= —-2y d=(k+1)y—k+1 
e=1-2z f = 6(k+1r)\ 


2 
g=-64y h=k+1—(k—1)rj/s m=3(¥—1) 


w = 3lk +3 — (k + 1)Ai/Ad 


and where A and B are given by 
(2.12) 1/A = 2f(k + 2) — K] 
(2.13) 1/B = 6[(k + 4)de — (k + 2)d4). 


3. Third order rotatable designs in two factors. Consider an arrangement of n 
points equally spaced on a circle in a two dimensional factor space. In reference 
[1], Box and Hunter prove that n > 2d is a sufficient condition for all moments 
through order 2d of the coordinates of these points to be invariant under rotation. 
That is, n > 2d is sufficient for the arrangement to be rotatable of order d. 
We shall prove the necessity of this condition as well. 

We shall use a theorem given by Bose and Carter in reference [3], an earlier 
version of which was stated by Carter in [4]. Let (2, vou.) (uw = 1, 2,--+, n) 
be the n points of any arrangement A (which may be a design) in the space 
of x; and x2. Denote by a(x), a(x2,) the coordinates of the point (21, , 72.) 
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after a rotation about the origin through a fixed angle a. From Section 2, it is 
clear that the arrangement A is rotatable of order d if and only if, for any rotation 
a performed on all n points of A, 


(3.1) Do a*(aw)o"(ru) = Driven for OS ¢g,057,0<q+rs 2d. 
u=l ual 


The Bose-Carter theorem proceeds as follows: Let z, = Liu + tau and a(z,) = 
ze". Also let 2, be the complex conjugate of z,, and Z,e ‘* the complex con- 
jugate of a{z,). Put g + r = p. Then we may write 


n n nm 
Q¢ 97? = = 7? zt 
(3.2) Dorf = 27° Dd (auth) (a—e) = 2” YS madi vz, 
um] u=l s+t=—p u=) 


where the m,; are sums of combinatorial constants, some of which may be zero. 
Similarly 


(3.3) >, a®(a1,)a" (rau) = 2°" 2, Met frre. 2uz4. 
ual s+t=p u=l 
From (3.2) and (3.3) we see that to satisfy (3.1) it is sufficient that 


(3.4) Do ziz, = Ofor0 < 3,05 t,0 <s +t S 2d unlesss = ¢. 


u=l 
Since 
22’ = (a, + ixe)*(a, — ize)’ = > NerX42 , 
G+r=s+t 
then 
OS z= DS ted, a(t) a (teu). 
u=l Gtr=s+t u=l 
Hence, if the arrangement A is rotatable of order d (i.e., if (3.1) holds), then 


o n 
fn 2.2, = } 8 May Dy Uitin 


u=l Qtr=s+t u 
is independent of a, from which it follows that (3.4) must hold. Thus (3.4) is 
necessary and sufficient in order that A be rotatable of order d. This is a state- 
ment of the theorem of Bose and Carter. 

Now let the n points (2, , 22.) be points equally spaced on a circle of radius 

p. Then we may put 

Tu = pcos (2nv/n), 

Zo. = psin (2rv/n) 
whence 


2riv/n 
Zu _ - > 


z, = peor” v=0,1,---,n—1. 
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The arrangement consisting of these n points is rotatable of order d if and only 
if 


n 


n—l 
(3.5) i zizi al f"Tz. gziete-Oln 
v=) 


u=) 
=0 forOSs,0S5t4,0<s+tS 2d,s#t 


which is a corollary of (3.4). By a well known theorem on the roots of unity 
Dra e**e-!™ — 0 if and only if s — ¢ is not an integer multiple of n. One sees 
immediately that s — ¢ cannot be an integer multiple of n if s + ¢ < n and that 
s — t will be such a multiple for some s and ¢ if s + ¢t 2 n. Since (3.5) should 
be satisfied for any non-negative s, ¢ with 0 < s + ¢ S 2d, then n > 2d is neces- 
sary and sufficient for equally spaced points on a circle to be rotatable of order d. 

Equation (2.5), then, if satisfied by n > 6 points equally spaced on a circle. 
But it may be verified that for this arrangement 


2 2 
n >, tiv Zu 
us 


which does not satisfy (2.10). Therefore, these points do not constitute a ro- 
tatable design. If m, is the number of points on the circle and nz points are added 
at the center, \, becomes 


= (m + m)p*n,/8 a N%, 1 
M = G2 -5[1+2]>3 


which satisfies (2.10), but then 


— atolls if Ne oo 2 
a jou OL ae) 8 


ny 
and (2.11) is not satisfied. Equally spaced points on a circle with additional 
points at the center, then, do not constitute a 3rd ORD. 

Now consider an arrangement of N points on two concentric circles with 
m, points equally spaced on a circle of radius p, and ne points equally spaced 
on a circle of radius p,, where n; + nz = N, pi ¥ po, p: > 0, pz > 0. We shall 
prove that the arrangement consisting of these n, + ne points is rotatable of 
order d if and only if both nm, > 2d and n, > 2d. 

In the same manner as before take the first n, points as 


pew’ (v one 0, 1, im 1). 


To allow the second n, points to take any orientation with respect to the n; points, 
take them as e“p,e""""'"* (v = 0, 1, --- , m2 — 1). The arrangement is rotatable 
of order d if and only if 


"at n2—1 
(3.6) a > Grwe—Gieg 4 G4" eriv(e—Oins fe 
_ r==(0 


v 








1088 D. A. GARDINER, A. H. E. GRANDAGE, AND R. J. HADER 


for 0 < s, 0 S t, withO < s+ # S 2d unless s = ¢. But the sums in (3.6) are 
0 or mn, and 0 or nz respectively. Hence (3.6) holds if and only if both sums are 
zero. In order that this be true we know that nm, > 2d and nz > 2d is necessary 
and sufficient. 

It is easily shown that this type of arrangement provides a 3rd ORD if n,, 
m > 6. For then 


x, = Nim pi + me pi)/8 _ 1 ni pi + ni or + m m(oi + 92) 
[ni pi + me ps?/4 = 2 mE pi + 3p} + 2m ne pips ’ 


and since pi + p2 > 2pip:, for p, ~ pz, % > 4. Therefore, (2.10) is satisfied. 
Similarly 
_ N* (m pt + m ps)/16 _ 2 


2 
7 mata 3 


since 


Ae _ 2 ni pi + ms ps + m me pi pr(pi + pr) 
AME 3 Ni py + N§ p> + m Me pj p3(2pi 3) 
which is greater than $ for p: ~ p2 and so (2.11) is satisfied, also. 

Thus, it has been shown that a simple class of 3rd ORDs in two factors exists. 
This class consists of designs which have seven or more points equally spaced 
on each of two concentric circles. Each of the circles may be rotated inde- 
pendently of the other and therefore there are an infinite number of configurations 
possible for designs with a given mn, and nz . Since points located at the center of 
the circles do not disturb the moment properties of the configuration, these may 
be added at will to achieve variations in the parameters \, and dg . 


4. Sequential third order rotatable designs in two factors. A 3rd ORD of the 
type described in the previous section may be performed in two “blocks.’”’ By 
judicious selection of p; and p,, the radii of the two circles, the coefficients in 
the Taylor series expansion may be estimated inde,endently of the block effects. 
If one block of points is a complete 2nd ORI’ and the second block consists of 
additional points necessary to make the whole a 3rd ORD, the design may be 
called sequential, in that an experimenter need not perform the second block of 
points if he feels the first block has given him an adequate approximation to the 
phenomenon. 

Suppose the first block consists of seven or more points equally spaced on a 
circle with some points at the center. This allows the estimation of polynomial 
coefficients up to and including the second order. Now add a second block con- 
sisting of seven or more points equally spaced on a circle of different radius from 
the first. Let be the number of points in the first block and nm. the number 
of points in the second block. Let 4, be the effect of the first block, 5. the effect 
of the second block, and let Z.. = 1 if the wth observation occurs in the wth 
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block, w = 1, 2, and Z,.. = 0 otherwise. Then, the expectation of the uth observa- 
tion can be written 


Mm = Bo + >. Bin + Za; Bi Vind ju 


‘ 


(4.1) + pp > Bip Vile + 7 bo( Zu —Z we) 
+ j w 


in which Z, = Ee Zwu/N, and N = m + m. 
If the estimates of the block effects are to be independent of the estimates of 
the polynomial coefficients, it is required that 


(4.2) pS (Zou — Ze) = 0 
(4.3) Xu (Zou — Ze)tiu = 0 
(4.4) a (Zou — Zw) tintin = 0 
(4.5) 2» (Zou — Ze) int jutin = 0 


for w = 1, 2 and i, j, 1 = 1, 2. (4.2) is satisfied by the definition of Z,, , while 
(4.3), (4.4) and (4.5) are satisfied with one exception, by the fact that Z.. — Z. 
is constant within blocks and each block contains a rotatable arrangement of 
points. The exception is in (4.4) when i = j. For this case, if mo, = the number 
of points at the center in the first block, and m2. = the number of points at the 
center in the second block, (4.4) becomes 


2 2 
| wd Pi = oe Sn 
E | [ny — nau] 2 + l NW | [n2 — nee] 3 0 
or 


(46) p2 _ M(™ — mu) 
pi Ni(N2 — No2) 

Therefore, by selecting p:, the radius of the circle in the second block, in 
accordance with (4.6) the experiment may be performed sequentially and esti- 
mates of polynomial coefficients will be free of block effects. It is interesting to 
note that (4.6) is independent of the number of points in the second block, if 
Ne = O, it being required only that n. > 6. A 3rd ORD with these blocking 
properties is not possible, however, if nero. = nine . 

The 3rd ORD may be sequentialized in three stages with a total of either 
three or four blocks: Block I would consist of n;/2 points of which mo/2 are 
central points and such that (m: — m)/2 is an integer greater than or equal to 
4. The (nm; — )/2 points would be equally spaced on a circle of radius p; , 
and the (nm; — )/2 points would constitute a Ist ORD. Block II would be 
identical with Block I with the points superposed so that Blocks I and II jointly 
would have nm; — nm points equally spaced on a circle of radius p, and nq points 
at the center and thus would form a complete 2nd ORD. 
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The third stage, Block III, would consist of nz — ne points, greater than 6, 
equally spaced on a circle of radius p, (where p, is determined from (4.6)) and 
MM points at the center, in a three block design. Blocks I, II, and III would 
make up a complete 3rd ORD. 

If the experiment were to be sequentialized in three stages and four blocks, 
the third stage would be constructed of two blocks similar to Blocks I and II, 
but with radius p, , and with the possibility of no central points. 


5. Third order rotatable designs in three factors (non-sequential). A 3rd 
ORD in three factors may be formed from the points at the vertices of a cube, 
two octahedra of different radii, and a cuboctahedron, all oriented symmetrically 
to one another. The coordinates of the points of the cube can be represented by 
all possible permutations of the elements of the vector, (+a, +a, +a); of one 
octahedron by the permutations of the elements of (+1.82969a, 0, 0); of the 
other octahedron by the permutations of the elements of (+1.16343a, 0, 0); 
and of the cuboctahedron by the 12 permutations of the elements of (-+a2', 
+a2' , 0). The value of ais the scaling factor chosen so that Fe riu = N, the 
total number of points. The constants, 1.82969, 1.16343, and 2* are those which 
will satisfy the moment requirements inherent in equations (2.6) and (2.7) for 
this composite configuration. The parameters for this design are given below 
for various numbers of points added at the center of the design. Also given are 
the values (5/7 )Az , which, in accordance with (2.11), must always be exceeded 
by As. 


No. of Center | | 

















- Points M Ae 5NY/7 
| 
32 0 .638 .300 291 
33 1 .658 .319 .309 
34 2 .678 .339 .328 
35 3 .698 .359 .348 
36 4 .718 .380 .368 
37 5 .738 .402 .389 
38 6 .758 .423 .410 
39 7 .778 446 .432 
40 8 .798 .469 455 
| 


Another 3rd ORD of the non-sequential type can be formed by orienting an 
icosahedron of radius a symmetrically with respect to a dodecahedron of radius 
1.11236224a, with or without central points. But with 0 to 8 central points 


Xs — (5/7)X; is never greater than 0.000061, so that this design could not be 
recommended. 


6. Third order rotatable designs in three factors (sequential). Of greater 
interest than a 3rd ORD per se is the 3rd ORD which can be performed se- 
quentially, and particularly those sequential designs in three factors which may 
be extended to higher dimensions. 
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Consider the sequential design as being performed in two parts: the first part 
to be a 2nd ORD and the second part a set of points which, when added to the 
first part, makes a 3rd ORD. If the second order moment properties are to be 
preserved after the addition of the second set of points, it is obvious than both 
parts of the design must be complete 2nd ORDs in themselves. 

For the first of these 3 dimensional sequential designs consider the design 
whose initial portion is the cube + octahedron configuration with points at the 
center. By adding to this, the points of a truncated cube and of another octa- 
hedron, a 3rd ORD results. This design would not be recommended in practice 
because, like the icosahedron-dodecahedron design of the previous section, the 
resultant matrix of normal equations is poorly conditioned. That is, although the 
inequalities (2.10) and (2.11) are satisfied, (2.11) is very close to an equality. 
Consequently, the linear and cubic coefficients are very poorly estimated. The 
design is presented here because it provides a basis for a more useful design 
which follows. 

The coordinates of a truncated cube in 3 dimensions can be written as all 24 
permutations of the elements of the vector, (--c, +d, +d), where the radius of 
the figure is given by p’ = c’ + 2d’ and where c is a measure of the amount of 
truncation. For example, if c = d the figure is not truncated at all and the 24 
points make up a triply replicated cube. If c = 0, the truncation is extreme and 
the figure becomes a doubly replicated cuboctahedron. It can be shown that if 


5+ 2710 2 
<7 eee 


the 24 points constitute a 2nd ORD, but this value of c is not helpful in con- 
structing a 3rd ORD. 

For the first portion of this sequential design let the cube have radius p; and 
an octahedron have radius 


i 
a wae 


Box and Hunter [1] show that this arrangement of 14 points comprises a 2nd 
ORD. To this, as the second portion of the sequential design, add a truncated 
cube of radius p; and coordinates, (-+-c, +d, +d) and another octahedron with 
coordinates (+)p,, 0, 0). Then it can be shown that if p, =+/3, p = 2° 
ps = 1.657765, ps = 1.705945, c = 0.184388, d = 1.164944, a 3rd ORD results. 
But for 0 to 10 central points \, — 5A4/7 is never larger than .0005. 

However, the difficulty of the ill-conditioned matrix may be avoided by 
modifying the design slightly. This is accomplished by using a cube and “doubled 
octahedron,” instead of a cube and octahedron in the first stage of experimenta- 
tion. By doubled octahedron is meant two experimental points at each vertex of 
an octahedron. Let p; , p2 , ps3 and p, designate the radii of a cube, doubled octa- 
hedron, truncated cube and octahedron respectively. If the first portion, con- 
sisting of the cube and doubled octahedron is to be second order rotatable, then 
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p2 = mrv/2/3. The second portion of the design consisting of the truncated 
cube and the other octahedron must have dimensions satisfying the equations 


(6.1) ps = ps t+ 10c’p3 — 15c* 
(6.2) 2s 


—2ic* + 9e'ps + cps + 303 
(6.3) $01 = ps — 19c’ps + 39c'p; — 2lc* 


which, in turn, will satisfy equation (2.5). If p, again equals +/3 and hence 
p: = +/2, then an admissable solution of (6.1), (6.2) and (6.3) is ps = 1.851208, 
ps = 1.985406, c = 0.341564. The coordinates of the resultant design are 
(for the first stage of experimentation ) 
the 8 permutations of (+a, +a, +a), 
the 6 permutations of (a +/2, 0, 0), 
the 6 permutations of (+a +/2, 0, 0) again, 
and, if desired, points with coordinates (0, 0, 0); 
(for the second stage of experimentation ) 
the 24 permutations of (+.341564a, +1.286527a, +1.286527a), 
the 6 permutations of (+1.985406a, 0, 0), 
and, if desired, points with coordinates (0, 0, 0), 
where, again, a is chosen so that >—*_, 77, = N. 
The values of the parameters for number of central points to 10 are: 








N anes of Content is | Me snt/7 

| | 

sick all altace wide I a i 

50 0 6271 .2902 .2809 
51 1 6396 | .3019 2922 
52 2 6522 .3139 .3038 
53 | 3 .6647 3261 3156 
54 | 4 .6773 .3385 .3277 
55 5 .6898 .3511 .3399 
56 6 .7023 .3640 3523 
57 7 .7149 3771 3651 
58 | 8 7274 .3905 | .3779 
59 9 .7400 4041 3911 
60 10 .7525 .4179 4045 

| 





Block coefficients may be introduced into the model with this design also. 
Following Section 4, m. — mu = 20 and nz — ne = 30, so the equation which 
expresses the condition for orthogonal blocking is 


"m1 N 
(30 + ne) do zin = (20+) DO wiv 
u=l u=n)+1 


or 


(6.4) Noe = 2.206n, + 14.124 





THIRD ORDER ROTATABLE DESIGNS 1093 


Applying equation (6.4), the following table of design numbers for approxi- 
mately orthogonal blocking results. 











noi nor | mm no N 

scons es sabcese abana do as an a ao ; 

} 

0 | 14 | 20 44 64 
1 16 | 21 46 67 
2 | 19 | 22 49 71 
3 21 23 51 74 
4 23 | 24 53 77 
5 25 | 25 55 80 
6 27 26 57 83 
7 30 27 60 87 


| 
| 


7. Third order rotatable designs in more than three factors. As is well known, 
only the analogues of the tetrahedron, the octahedron and the cube exist, as 
regular figures in more than four dimensions. The latter two were used success- 
fully by Box and Hunter [1] in their development of 2nd ORDs in the higher 
dimensional factor spaces. In Sections 5 and 6 some semi-regular figures were 
described which provided.3rd ORDs for three dimensions. Of these semi-regular 
figures, the truncated cube was of particular interest in that it provided a basis 
for the construction of three dimensional sequential 3rd ORDs. 

The higher dimensional analogue of the truncated cube is not easy to identify. 
The obvious extension from three dimensions to k dimensions would be the 
figure whose coordinates are the permutations of the elements of (+c, +d, 
+++, +d), there being k — 1 elements +d. Call this “truncated cube (1).” A 
less obvious, but nevertheless reasonable extension to k dimensions is the figure 
whose coordinates are the permutations of (-bc, +c, «++ , +c, +d, +d, --- , +d) 
with, say, r elements +c and k — r elements +d, and with 1 < r < (k + 1)/2. 
Let this figure be called ‘‘truncated cube (r).” 

From the point of view of economy in the number of experiments, “truncated 
cube (1)” would be preferred as a part of a design since it contains fewer points. 
The number of points in “truncated cube (1)” is 2". The number of points in 


“truncated cube (r)” is (*) 2‘ which is always larger for 1 < r < k — 1. Un- 


fortunately “truncated cube (1)” cannot be used with the “cube” and ‘“‘octa- 
hedra”’ to form sequential 3rd ORDs for k > 3. This can be shown as follows. 

Consider the configuration made up of the k-dimensional ‘‘truncated cube (1)” 
and a k-dimensional “octahedron” of radius p. For a sequential design of the 
type described in Section 6, this composite configuration would comprise the 
second stage of experimentation and therefore the coordinates must satisfy the 
requirement (in addition to the requirements satisfied by the symmetry of the 
configuration ), 


(7.1) YS rh = 3 Dd zz. (¢#j = 1,2,---,k). 
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For this configuration 


Me = et + (k — 1) a] + 26 
> ruth. = 220 + (k — 2) a’). 


Substitution of the above into (7.1), requires 
(7.2) = 3d + [(4 + 2k) d* — 2 *p'f. 


The configuration for the second stage when combined with the first stage 
configuration consisting of a k-dimensional “cube” of radius (k)' and k-di- 
mensional “octahedron” of radius (2)*'* must satisfy the condition 


(7.3) > rire = 3 >> xiuciuric , (i#j# l= is 2, Pie , k). 
For the combined configurations (7.3) requires 


(7.4) ro ifs: 0+ Wid + OF 


The minus sign in (7.4) will give a negative c’ (with k > 3) and therefore must 
be disregarded. Equating (7.2) to (7.4) (with the plus sign) and simplifying 
gives 

(7.5) 3d* + d'(2“o + SO + 2k) & + 2) +1=0, 


which is impossible since each term on the left of (7.5) must be a positive 
quantity. Therefore, “truncated cube (1)” cannot be used in a 3rd ORD of this 
form if k is greater than 3. 

With k = 4a sequential 3rd ORD was discovered. The first stage of the design 
consists of a four dimensional “cube” of radius 2 and a 4-dimensional ‘“‘octa- 
hedron,”’ also of radius 2. (Actually, this is a 4-dimensional regular figure of 24 
points.) The second stage is comprised of a 4-dimensional “truncated cube (2)” 
with coordinates (-+-c, --c, +d, +d) and another 4-dimensional ‘‘octahedron”’ 
of radius p. To satisfy equation (2.5), we must have c = 1.200919, d = .256303, 
p = 1.736604. This design contains 16 points on the “cube,” 8 points on the 
first “octahedron,” 96 points on the “truncated cube” and 8 points on the 
second “octahedron” for a total of 128 points without center points. The design 
parameters are \, = .676 and A, = .349. The coordinates of experimental points 
for this design are 

(for the first stage) 

the 16 permutations of (+a, +a, +a, +a), 
the 8 permutations of (+2a, 0, 0, 0), 
and, if desired, central points (0, 0, 0, 0); 
(for the second stage ) 
the 96 permutations of (+1.200919a, +1.200919a, +.256303, +.256303a), 
the 8 permutations of (+1.736604a, 0, 0, 0), 
and, if desired, central points (0, 0, 0, 0), 
with a such that : tin = N. 
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For approximately orthogonal blocking of the two stages of experimentation, 
the number of center points in each block, no; and ne , is shown in the following 
table. The total number of points and the design parameters are also given. 


| 
j 


me Ae ds 6n3/8 





| 
304 | — .388 


104 | 136 | .719 
108 141 | 745 | ae .416 
i 


| aa 145 | .766 447 .440 


| 114 149 .787 A720 | 465 


i 


The relationship, \s > 6A%/8, appears to be sufficiently well satisfied so that 
no investigation of the design utilizing a doubled octahedron (and 8 additional 
points) was made. 

No attempt was made to extend the concept of 3rd ORDs to more than four 
dimensions, chiefly because the approach pursued in this paper required the use 
of an excessive number of experimental points. Investigations were made, follow- 
ing this approach, only of the sequential type of rotatable design because this 
is the type which seems likely to be most useful to an experimenter. 

Considerable savings were demonstrated by Box and Hunter in the case of 
2nd ORDs by the use of fractional replication for k > 4. With k equal to five 
or more the second order coefficients are confounded only with third and higher 
order effects when fractional replication is used. But for third order coefficients 
to be confounded only with fourth and higher effects, the dimensionality must 
be at least seven in order to make use of fractional replication. If a half replicate 
of a 7-dimensional design of the type described in the preceding section were 
possible it would require at least 1,436 experimental points. 

If a full replicate, 5-dimensional design of this type were possible, 372 points 
would be required. The same design in six factors would require 1,048. Third 
order rotatable designs derived from figures which are symmetrical in all k-di- 
mensions would appear to be impractical for k > 4. 


8. Summary and conclusions. This paper is concerned with extending the 
criterion of rotatability, as advanced by Box and Hunter [1], to experimental 
designs for estimating response surfaces by third order polynomial equations. 
The method of attack has been to examine combinations of regular and semi- 
regular geometrical figures and find those combinations whose coordinate points 
satisfy the moment properties, to order six, of spherical distributions. Designs 
with these properties and the attendant restrictions were shown by Box and 
Hunter to have spherical variance contours when the polynomial coefficients 
were estimated by the method of least squares. 

It was found that 3rd ORDs in two factors could be attained by locating 
seven or more experimental points equally spaced on each of two concentric 
circles of different non-zero radii. Also it was shown that certain rotatable de- 
signs in two factors can be performed in two stages, so that second order poly- 
nomial coefficients can be estimated after the first stage and third order poly- 
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nomial coefficients after the second stage. By choosing the radii of the two circles 
in the proper ratio it is possible to obtain estimates of the polynomial coefficients 
which are independent of ‘‘block” effects due to running the experiments in two 
stages. Such designs were termed sequential 3rd ORDs. 

In three factors, 3rd ORDs were presented which consisted of composites of 
cubes, truncated cubes, octahedra, cuboctahedra, icosahedra and dodecahedra. 
Two of these designs in three factors were constructed so that they might be 
performed sequentially. 

One sequential 3rd ORD in four factors was also presented. This design has as 
its experimental points the vertices of the 4-dimensional analogues of a cube, 
a truncated cube and two octahedra of different dimensions. 


9. Acknowledgement. The authors express their gratitude to the referee who 
called their attention to the theorem of Bose and Carter and suggested the 
proofs given in Section 3. As a result, Section 3 was shortened and considerably 
improved. 
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SECOND ORDER ROTATABLE DESIGNS IN THREE DIMENSIONS' 


By R. C. Bose anp Norman R. Draper’ 
University of North Carolina 


0. Summary. The technique of fitting a response surface is one widely used 
(especially in the chemical industry) to aid in the statistical analysis of experi- 
mental work in which the “yield” of a product depends, in some unknown fashion, 
on one or more controllable variables. Before the details of such an analysis can 
be carried out, experiments must be performed at predetermined levels of the 
controllable factors, i.e., an experimental design must be selected prior to ex- 
perimentation. Box and Hunter [3] suggested designs of a certain type, which 
they called rotatable, as being suitable for such experimentation. Very few of 
these designs were then known. Since that time the work of R. L. Carter [6] has 
provided many new second order rotatable designs in two factors. However, addi- 
tional methods were needed which would provide both second and third order 
designs in three and more factors. The present work represents an attempt to 
meet, in part, this need. New construction methods for obtaining rotatable de- 
signs of second order in three dimensions are here presented. By use of these 
methods various infinite classes of designs are codtained, and it may be shown 
that all the rotatable designs previously knovri can be derived as special cases 
of these infinite classes. Also derived is an infinite class of second order rotatable 


designs which contain only 16 points; only two particular designs contain fewer 
points. 


1. Introduction. A great deal of information is now available about the theory 
of response surfaces and the use of rotatable designs. Such information may be 
found in papers by Box [1], [2], Box and Wilson [5], Box and Hunter [3], [4] and 
the Ph.D. dissertation of Carter [6]. The paper [3] by Box and Hunter provides 
the necessary background for the present work, and a discussion of polynomial 
approximation and of the desirability of rotatable designs will be found therein. 
We shall be concerned here with second order rotatable designs in three control- 
lable factors and we shall assume that the measurements of the factors have 
been coded, permitting the use of Cartesian axes in three dimensional space to 
describe an experimental design. 

Suppose, in an experimental investigation with k factors, N (not necessarily 
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distinct) combinations of levels are employed. Thus the group of N experiments 
which arises can be described by the N points in k dimensions 


(1.1) ( Leu ig: Mews’? ty Bode y= 1,2,---,N; 


where, in the uth experiment, factor t is at level x . 
The set of points (1.1) is said to form a rotatable arrangement of the second 
order in k factors if the following conditions are satisfied: 


a tin = ai Tou 1. 2s Tew ” AN, 
- . rr) 

Siebutechoehe snr use ak 
u s ’ 


and all other sums of powers and products up to and including order four are 
zero, where all summations are over u = 1 tou = N. The set (1.1) is said to form 
a rotatable design of second order if the conditions (1.2) are satisfied and a certain 
matrix used in a subsequent least squares estimation is non-singular. Box and 
Hunter [3] show that the necessary and sufficient condition for this to be so is 


(1.3) h/Az > k/(k + 2), 


(1.2) 


3>, wit = 3MN, (i ¥ j) 


a condition which may always be satisfied merely by theaddition of points at the 
center (0, 0, 0) of the design. Equality in (1.3) is attained when all the design 
points lie on a k-dimensional sphere, and it is impossible for the inequality in (1.3) 
to be reversed under any circumstances. 

When presenting a rotatable design, it is customary to “scale” it. By this it is 
meant that the scale of the coded controllable variables is chosen in such a way 
that A, = 1. The reason for this is as follows. Given a second order design which 
satisfies the conditions (1.2) with a specified value of \4/A} , there are an infinite 
number of values possible for A: > 0. Since these designs can be derived one from 
the other merely by change of scale, we do not regard them as different. Thus 
the scaling condition \, = 1 fixes a particular design and enables better compari- 
son between two designs with different values of \4/d} . 


2. A transformation group in three dimensions and its generated point sets. 
We shall define certain transformations applied to points in three dimensions. 
Let W(x, y, z) = (y, 2, z). Then W*(z, y, z) = (z, 2, y) and W*(z, y, z) = 
(x, y, z). Thus W, W* and W* = IJ form a cyclical group of order 3. Further let 
R,(z, Y; z) = (—z, Y; z), R(x, y; z) _ (z, —a z), R;(2, Y; z) = (2, y, —2). 

The four transformations represented by W, R,, and R, and R; generate a 
group G of transformations of order 24 with elements 


W’, W’R,, W’R., W’R;, W’R2R; , 


(2.1) 
W’R;R, ’ W’RiR2 ’ W’RR2R; Gj = a 2, 2). 


It is easily seen that all the 24 elements in (2.1) are distinct. While R; , Rz and 
R; commute, W’ and R; do not (j = 1, 2;7 = 1, 2, 3). 
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A group table may be constructed, employing the identities 
(2.2) W = Ri = R; = Ri = 1 


and identities of the type WR, = R;W, to verify the statements above. Because 
of the size of the group the table will not be reproduced here. 

Given a general point (z, y, z) in three dimensions, we may apply to it all the 
transformations of the group G. In this way we obtain a set of 24 points with 
coordinates 


(2.3) (+2, +y, +2), (ay, +2, +2), (+2, +2, ty). 
We shall denote this set by 
(2.4) G(a, y, 2). 
Note that if (1, m, n) denotes any other point of the set, G(2, y,z) = G(l, m,n), 


i.e., any point of the set, when operated on by G, will give rise to the same set. 
The set G(2z, y, z) satisfies all the moment conditions (1.2) except 


N N 
(2.5) Dei = 3d zich (6X5), (i,7 = 1,2,3). 
uml u=l 


We now define a function K(z, y, z) of the point (2, y, z) as 
(2.6) K(a, y,z) = ¥(at + oy + 2 — 3y’2 — 322” — 32’y’). 

This function is constant for all of the 24 points of G(z, y, z). Furthermore, if 
it has the value zero, then G(z, y, z) is a rotatable arrangement since the out- 
standing condition (2.5) becomes satisfied. Let 
(2.7) a” = a2’, 

Then, if K(2, y, z) is zero and z ¥ 0, 

(2.8)  — 3t(s +1) + (s — 384+ 1) =0. 


This is the equation of a hyperbola. If the point (s, ¢) lies on the hyperbola 
and also in the first quadrant, G(z, y, z) is a rotatable arrangement. Fig. 1 
shows points (s, ¢) for which this is true. There is complete symmetry ahout the 
line s = t. The value of s at the points P; and P; , where the hyperbola intersects 
the line t = 0, is (3 — ~/5)/2 and (3 + +/5)/2, respectively. If we solve for t 
in terms of s, we obtain 


(2.9) t = 313(s + 1)  VY5(s# + 6s + 1)). 


This yields two non-negative solutions if s’ — 3s + 1 > 0, which implies s = 
(3 + Y5)/2 or 0 S s S (3 — V5)/2. Otherwise there is only one positive 
solution for each value of s 2 0. The reason for this is clear from Fig. 1. 

The point set G(2, y, z) is clearly spherical, and thus equality will be attained 
in the non-singularity condition (1.3) unless additional points are added at the 
center to form the design. If np center points are added, N = 24 + m,andd.N = 
8(2° + y+ 2) = 8(s+t4+1)2. 
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Thus if we apply the scaling condition \, = 1, 
(2.10) z = N/8(s+t+1), 


and we have an infinite class of second order designs which depends on one pa- 
rameter s. For if s 2 0 is specified, 


t= 33(s + 1) + V5(# + 6s + 1)], 


[V/8(s +¢+ 1}, y = tz, x= sz, 


and all design points are fixed. Each non-negative s gives rise to one or two de- 
signs according as (2.9) yields one or two non-negative values of ¢. For this 
class, 44/A3 = 8(a°y? + ye? + 22*)/N = 8(st + 8 + t)z*/N. Consider the 
special case s = t = +/10 — 3. We then have x = y = (N — 82)/16,z =. 
[((5 + 2 +/10)N/120}’. This is the design referred to as the truncated cube by 
Gardiner, Grandage and Hader ({9], Sec. 6, Par. 4). 

Let us now suppose that K(z, y, z) # 0 for the points of the set G(z, y, z). We 


shall define 7 K(x, y, z) over a point set S to be the excess of that set and write 
it Ex(S). Thus 


IV 


0 only), 
(2.11) 


2 


(2.12) Ex[G(a, y, z)] = 8(a° + y! + 2 — Bye — 322” — 32’y’). 


This can take both positive and negative values according to the choice of x, 
and z. Clearly, >>; Ex(S;) = Ex(>>; S;), where the notation >>; S; means 
that points which belong to more than one set S; contribute to the sum each 
time they occur. The notation thus does not denote the ‘‘union” of sets in the 
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usual sense. Furthermore, if a number of sets S;, S:,---, Sm (say) satisfy, 
either separately or together, the conditions for a second order rotatable ar- 
rangement except for the condition (2.5), then the condition 


(2.13) Ex(S; + --- + Sn) = Ex(S,) + --- + Ex(S,,) = 0 


is a necessary and sufficient condition for the points of the whole set S,; + S, + 
--- + §,, to form a rotatable arrangement of second order. We shall make use 
of this important fact in Section 3. 

For certain special choices of (x, y, z) in three dimensions, the 24 points of 
G(a, y, 2) will coincide in pairs or in triplets or in quadruplets. For example, 
G(p, q, 9) consists of the twelve points 


(2.14) (+p,+q¢,0), (0, +p, +9), (+9,0, +p), 


each occurring twice. We may denote the 12 point set by $G(p, g, 0). This set 
has excess 


(2.15) Ex($@(p, 9, 0)] = 4(p* + g' — 3p'¢’), 
a quantity which may be made positive or negative according to the values of 
p and q. 

The set 4G(p, g, 0) will itself form a rotatable arrangement if p* — 3p’q° + 
q = Oorp’/¢ = (3 + V5)/2. Thus p/q = 6 or 0 where 6 = (+/5 + 1)/2, 
@’ = (»~/5 — 1)/2. Thus the set reduces to the 12 points (+0, +1, 0), (+1, 
0, +6), (0, +6, +1), which as Coxeter [8] shows constitute the vertices of an 


icosahedron. Adding center points we get the icosahedron design given by 
Box and Hunter {3}. 


3. The formation of rotatable arrangements and rotatable designs by combina- 
tion of several generated points sets. Consider the set G(a, a, a); this consists 
of the eight points 


(3.1) (+a, sta, +a) 


each occurring three times. We may therefore denote this set of 8 points by 
4G(a, a, a). 


(3.2) Ex[}G(a, a, a)] = —16a‘, 


which is always negative, hence this set alone cannot form a rotatable arrange- 
ment. 
Consider the set G(c, 0, 0); this consists of the six points 


(3.3) (+c,0,0), (0,+c,0), (0,0, +c) 
each occurring four times. The six points may be denoted by 3G(c, 0, 0). 
(3.4) Ex[}G(c, 0, 0)] = 2c’, 


which is always positive, and so this set alone cannot form a rotatable arrange- 
ment. 
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For consistency of notation we may write the point (0,0,0) as (1/24)G(0,0,0). 
Hence np center points may be denoted by 


(3.5) _ G(0, 0, 0). 


Consider the combination of sets #G(a, a, a) and }G(c, 0, 0). Then 
(3.6) Ex[4G(a, a, a) + 4G(c, 0,0)] = —16a* + 2c’. 


This is zero if c’ = 2+/2a’, in which case the 14 points form a rotatable arrange- 
ment. The actual design points are obtained by applying the scaling condition 
2 = 1. This gives 8a” + 2c? = N = 14 + mo, where m is the number of center 
points added. Thus 4(2 + +/2)a’ = N, and both a and c are determined when 
N is given. We have obtained the well-known cube plus octahedron design first 
presented by Box and Hunter [3]. 

The method may now be extended. We have seen that the combination of 
generated sets leads to a single design when only two parameters are present, 
as in the example just given, since the two conditions Ex(set) = 0, A, = 1, com- 
pletely determine the design. The first condition alone completely determines the 
ratio of the two parameters and is sufficient to determine the design apart from 
scale. We now examine a combination of sets which contains three parameters. 
We shall see that we obtain a single infinity of designs which depend on a single 
parameter ratio. Consider the 20 points 


1G(cy ? 0, 0), 1G (C2 ’ 0, 0), 3G(a, a, a). 


The excess of this whole set is 2c} + 2c? — 16a‘. Note that since Ex[4G(a, a, a)] = 
— 16a‘ is negative, we must combine with it sets at least one of which has posi- 
tive excess to compensate. Thus the set has zero excess if ci + ¢: = 8a‘. Set 
ci = 2a’, c; = ya’. Then x + »* = 8. Any positive values of x and y which 
satisfy this equation will give rise to a rotatable arrangement of second order. 
Thus if (z, y) is a point of the circle z* + y’ = 8 and also lies in the first quad- 
rant, then we shall have a rotatable arrangement. No additional center points 
are required to make the arrangement into a design since three radii of the parts 
of the arrangement: 2a, ya and +/3a are not all equal. Now Nd, = 2ci + 
2c: + 8a° = 2(x + y + 4)a’. Applying the scaling condition 4. = 1, we obtain 


y= 78-2 a = ([N/2A(e2+y+ 4)}”, ¢, = 2'"g, in ya, 


and the design becomes completely determined. For this class, \4/A; = 8a‘/N. 
We now derive, as special cases of the infinite class just obtained, two designs 
which were previously known. 

(1) « = 0. Then y = 2/2, ce = cla, c, = 0. We have obtained the cube plus 
octahedron design, with 6 center points which are vertices of the degenerate 
octahedron. 

(2) « = y = 2. Thence, = & = av/2. This gives rise to the design described 
by Gardiner, Grandage and Hader, which consists of the vertices of a cube 
plus those of a doubled octahedron ((9], Sec. 6, Par. 6, first stage). 
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The first summary table which occurs in Section 5 contains several other in- 
finite classes of this type. 


4. Classes of designs using sets with variable excess. In the previous section 
the sets we used in combination had a positive or a negative excess. Let us now 
consider the set of 12 points 4G(p, g, 0). The excess of this set is 4(p* + q‘ — 
3p'q), @ quantity which may be made positive or negative according to the 
way p and g are chosen. Thus $G(p, g, 0) may be combined with all of the sets 
3G(a, a, a), }G(c, 0, 0) and 3G(f, f, 0) to obtain rotatable arrangements and 
hence designs. For example, Ex[4G(p, g, 0) + 4G(a, a, a)] = Oif p’' + ¢ = 
3p'¢’ + 4a‘. Set p® = xa’, q = ya’, and we havex’ — 3ry + »’ = 4. Any point 
of this hyperbola which lies in the first quadrant will give rise to a rotatable ar- 
rangement of second order. If we solve for y in terms of x, we obtain y = [32 + 
+/5z? + 16]/2. This yields two positive solutions if x > 2; otherwise only one 
positive solution arises. This may easily be seen from Fig. 2. The radii of the 
separate point sets are »/z + ya and +/3a and these are equal when x + y = 3. 
Since the straight line z + y = 3 intersects the hyperbola in two points P and 
Q, equality in (1.3) occurs for two arrangements of the class. For these two ar- 
rangements, the addition of center points is necessary to satisfy the non-singu- 
larity condition. Applying the scaling condition \, = 1, we obtain an infinite class 
of second order rotatable designs, each design consisting of 20 points plus any 
center points which may have been added. The class depends on one parameter 
xz. Given x 2 0, 


y = [8a + 5a? + 16)/2, a=(N/4(r+y+2)!", p=c'%a, 
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where the lower sign in y is to be taken only when z > 2. For this class 
hi/Ab = (8a* + 4p’q’)/N = 4(2 + zy)a‘/N. 


This class has two well-known special cases. 

(1) When a = 0,2 = ~, y = o. Ignoring the degenerate set $G(a, a, a), 
we obtain the icosahedron design discussed at the end of Section 2. 

(2) If we choose one of the two points on the hyperbola for which z + y = 3, 
then z = y' = (34 VW5)/2 = @, 0, where 6 = (1/5 + 1)/2. Thus the 20 
design points (other than the center points) consist of constant multiples of 


(0, +0", +8), (+06, 0, +0"), (+0", +0, 0), (+1, +1, +1). 


As Coxeter [8] shows, these are the vertices of a dodecahedron, which form a well- 
known second order rotatable design, given in [3]. 
Several other classes of this type may be found in the summary table. 


5. Summary table. Table I is a table of infinite classes of second order ro- 
tatable designs in three dimensions of the type derived in Sections 3 and 4. The 
table shows the generated sets used to form each class together with the design 
coordinate values in terms of a single parameter. 


6. A second method of generating point sets suitable for building second order 
rotatable designs. Define 


a . a 
cosa —sina 0 is 2 — 2 ’ 
Ti =|sina@ cosa OQ], T; =| . @ a : 
0 0 1 sin 5 cos 5 0 
0 0 —1 


where a = 27/s. Consider the effect of applying 7; and T, to points of the form 
(r, 0, b), i.e., points on the plane y = 0, and to all other points obtained from re- 
peated applications of 7; and T,. In this way we shall obtain 2s points with 
coordinates 


(6.1) (r cos ta, r sin ta, b), (r cos (t + 4)a, rsin (t + 4)a, — 5), 


where ¢ = 0, 1, 2, --- , (s — 1). We shall denote the set of these 2s points by 
T(r, 0, b). Provided s 2 5, the set T,(r, 0, 5) has the following sums of powers 
and products: 


X 2. = XL yu = sr, 2 2, = 2sb’, 
4 rn 4 
2 Lu 2» Yu 


>» xy. = sr/4, ze Yulin = 7: zut, = srb, 
“a u u 


lI 


3er*/4, > zi = 2sb*, 


and all other sums of powers and products up to and including order four are 
zero. This is easily verified by using the fact that each of the two s-gons in the 
set of 2s points is a second order rotatable arrangement in two dimensions [3]. 
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A rotation about the z axis of the complete point arrangement will not affect the 
properties held by the sums of powers and products. We now recall the cyclic 
group W, W’, J, defined in Section 2, and apply this to 7,(r, 0, b) to give set 
T.(b, r,0) and T,(0, b, r). In all we now have 6s points, which we denote by 
T(r, 0, b) with coordinates 


(r cos ta, r sin ta, b), (r cos (t + 4)a, rsin (t + $)a, —b), 
(b, r cos ta, r sin ta), (—b, r cos (t + 4)a,rsin (t + 4)a), 
(r sin ta, b, r cos ta), (r sin (t + 4)a, —b, r cos (t + 4)a), 


where a = 2n/s (s 2 5) andt = 0,1, 2,---,(s — 1). 
The set T(r, 0, 6) has sums of powers and products 


> zr = > = > Se = 28(r° 4. b’), 
Dz = Dy = La = a(3r* + 4b')/2, 


u 


2 rey as Yun = 7 zt, = sr(r’ + 8b’) /4, 


and all other sums of powers and products up to and including order four are 
zero. 

The formulae for the sums of powers and products will extend to the case 
s = 4, provided we fix as the set 7.(r, 0, b) the points 


(+r, 0, 5), (0, +r, b), (+r/+/2, +r/v/2, —b). 


In the case s = 4, rotation of the s-gons about the z axis will affect the sums of 
powers and products and thus cannot be permitted. This point must be remem- 
bered whenever specific reference is made to the case s = 4. From the properties 
of sums of powers and products given above, it follows that the excess of the set, 
defined in the same way as before, is s(3r* — 24r°b* + 8b*)/4. Of course the ex- 
cess of each single point varies in this case and it is necessary to consider the 
total effect over all the points. Since its excess can be made positive or negative 
according to the choice of r and b, it will be possible to combine the set T(r, 0, b) 
with sets of both positive and negative excess. Because of the large number of 
points which would otherwise arise, we shall combine it only with 4G(a, a, a) 
and 3G(c, 0, 0). The designs thus obtained will be found in the second summary 
table below. 

In the same way that special choices of x, y, and z made it possible to take 
fractions of G(x, y, z), a special choice of b will enable us to use a smaller point 
set than T(r, 0, b). Set b = 0; then by employing only the transformation 7; 
and W we can produce a set of 3s points with suitable moment properties. We 
shall denote these 3s points by the notation To(r, 0, 0). The points will have 
coordinates 


(r cos ta, r sin ta, 0), (r sin ta, 0, r cos ta), (0, r cos ta, r sin ta), 
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where ¢ = 0,1, 2, --- , (s — 1) ands 2 5. The sums of powers and products of 
the set are 


wt = Ly = Dee = 8, 
au u u 


Eat- Dvt- Es 

u u u 
> ziyt = Dye. = DY aia? = sr'/8, 
u u u 


and all other sums of powers and products up to and including order four are 
zero. 

Clearly any rotation of the 3 s-gons about their axes will also give rise to the 
same moments, but we shall restrict attention here to the set 7'o(r, 0, 0). From 
the sums of powers and products it follows that the excess of this set is 3sr*/8 
which is a positive excess. Thus to form an infinite class of second order designs 
we must combine 7'o(r, 0, 0) with sets at least one of which has negative excess 
Two examples of this will be found in Table IT. 


3sr*/4, 


ll 


7. An extension of the method: a 16 point design class. Consider the set of 12 
points 


(2, ¥, 8), (25.9, —2), (—-2, 9, —2), (2, —¥, 8), 
(7.1) (y,z,2), (—y, —2,2), (y, —2z, —-z), (—y,2, —2), 
(s, 2,9), (—&, 2, —y), (8, —2, 9), (8, ~2, —¥). 


This set consists of all points of G(x, y, z) for which the product of the coordinates 
is xyz. It can be described as a 3} replicate of G(x, y, z) and we shall write it 


(7.2) G(x, y, z). 


The complementary set, where the product of the coordinates is —xyz, we shall 
denote by 


(7.3) G (a, y, z). 


The set (7.2) satisfies all the conditions for a second order rotatable arrangement 
except two. These are 


N N 
(7.4) z. ry = Sz. Ti Liu (4,7 = 1, 2, 3), (1 # 7) 
u=1 u=1 
and 
N 
(7.5) - LiuLeutaun = 0. 
u=1 
We recail that 
N N 
Ex[Point set(a1. , Zou, Zu), u=z=1,2,---,N|= pS i. - LinSsus 
— wank gan 
(7.6) 
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Let us define a second excess function which relates to the left member of (7.5) as 
N 

(7.7) Fx[Point set (x1. ,%u,%u), Uu=il,2,---,N) = > LivLeuLsu - 
u=1 


Then if S is a point set or a combination of points sets which satisfies all of con- 
ditions (1.2) except (7.4) and (7.5), and if 


(7.8) Ex(S) = 0, Fx(S) = 0, 
then S is a rotatable arrangement of the second order. Now 
(7.9) Ex(G* (2, y, z)] = 4(2° + y* + 2 — 3y%2* — 3272” — 32’y’) 


(7.10)  Fx{@(z, y, z)] = £12 zyz. 
The set G’*” (a, a, a) consists of the four points 
(7.11) (a, a, a), (a, —d, —a), (—a, Gg. a), (—a, —a, a), 


each repeated three times. Thus we may denote the four points (7.11) which 
form a half replicate of the 2° factorial design, by 3G" (a, a, a). Similarly the 
set 44° (a, a, a) consists of the four point 


(—a, —a, —a@), (—a, a, a), (a, —a, a), (a, a, —a). 

It is easily seen tha’ 
Ex(3G™ (a, a, a)] = —8a', Fx(3G* (a, a,a)| = +4a’. 
Let S be the set of 16 points defined by 
S = G(x, y, z) + 4G (a, a, a), 
Ex(S) = —8a‘ + 4(2° + yf + 24 — 3y2” — 322” — 32’y’), 
Fx(S) = 12 xyz — 4a’. 
Thus S is a rotatable arrangement if 
(7.12) aty tz — 3(y2 + 272" + 2’y’) = 2a‘, 3 ryz = a’. 
If we set 
(7.13) a’ = ua’, y’ = va’, z = wa’, 
it follows from (7.12) that we can write 
utv+w = 8, uv + vw + wu = (p — 2)/5, uvw = 1/9. 

These equations imply that u, v and w are the roots of the cubic 
(7.14) & — pi’ + (6 — 2)t/5 — 1/9 = 0. 


If for a given 8 this cubic has three positive roots u, v and w, we shall be able to 
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use these values to obtain a rotatable arrangement of the second order which 
contains only 16 points, using the relations (7.13). A sufficient condition for 
(7.15) Ax’ + Be? + Cr +D=0 


to have three positive roots (provided all roots are real) is A > 0, B < 0, C > 0, 
D <0. Thusif 8 > +/2and all three roots of (7.14) are real, they are all positive. 
The necessary and sufficient condition for (7.15) to have three real roots is 
A = BC’ + 18ABCD — 4AC* — 27A°D* — 4B*D > 0 (see Conkwright [7)). 
For the equation (7.14) we find 


(7.16) A(sB) = 3645(98° + 368° — 508° — 2528" — 9008 — 87). 


It may be shown that A(2.691376)/3645 = .0031, A(2.691375)/3645 = —.04, 
so that a root of A = 0 lies near 8 = 2.691376. Furthermore 


A(2.691376 + s)/3645 = .0031 + A;(s), 


where A,(s) is the following sixth degree polynomial in s with all coefficients 
positive: 


Ai(s) = 98° + 145.385 + 1013.9s* + 3846.7s° + 7992.18" + 7089.88 
Hence s > 0 => A,(s) > 0 = A(2.691376 + s) > 0, and 
A(+/2)/3645 = —1789, A”(8)/7290 = 1356* + 2166” — 1508 — 250 > 0 
for 8 > +/2. 


TABLE III 
A Selection of Designs from the 16 Point Series (when no = 0) 








5 a z y : m/rt 
y Weare erreere 

2.691376 1.04096 .49090 . 49090 1.56026 .60140 
2.7 1.03975 45968 . 52238 1.56036 | 60131 
3 | 1.00000 31645 | 67348 1.56405 | .60000 
4 89443 | . 18375 . 82366 1.57775 | .60800 
5 . 81650 . 12862 . 88669 1.59078 | .62222 
6 75593 .09737 .92330 1.60206 | .63673 
7 70711 .07722 | .94697 1.61160 .65000 
8 . 66667 06328 96348 1.61965 .66173 
9 63246 | 05321 .97559 1.62647 | .67200 
11 57735 .03951 99212 1.63732 . 68889 
14 .51640 | .02767. | 1.00687 1.64887 70756 
19 4472 .01759 1.02001 1.66110 .72800 
49 . 28284 .00430 | 1.04018 1.68464 . 76928 
99 . 20000 00151 1.04601 1.69288 | . 78432 
oo 0 0 1.05146 1.70130 | . 80000 


| 
When no = 0, multiply a, z, y and z by a and multiply 4/A3 by a’, where a? = 1 + (no/16). 
The variation in the values of a z, y and z is so well controlled that it is possible to 
use a graph to find their values for values of 8 other than those in the table. 
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This means that the function A is convex for 8 > +/2 and thus has only one root 
in that range which must be at approximately 6 = 2.691376. Thus if 8 > 2.7 
the equation (7.14) gives rise to three real positive roots u, v and w and the 16 
points of S form a second order rotatable arrangement. The radii of the two sets 
of points which comprise the arrangement are +/8a and +/3 a. Thus, when = 3 
it will be necessary to add center points to the arrangement in order to satisfy 
the non-singularity condition. It is desirable to add center points to arrange- 
ments which arise from values of 8 near the singular value 3 in order that the 
variances of the estimates of the model coefficients will not be large. When a = 0, 
we shall retain the degenerate points as center points. If N = 16 + mo where 
is the number of center points added, it is easy to verify that the scaling con- 
dition \: = 1 leads to a’ = N/4(8 + 1). Thus we have found an infinite class of 
second order rotatable designs depending on a parameter 8; each design con- 
tains 16 points excluding any center points which may have been added. Given 
a value of 8 > 2.691376, we can find u, v and w, the positive roots of (7.14). 
Then 


a = (N/4(6 + 1)}, z= wa, y = v'a, z= wa, 
and the design is completely determined. An casy calculation shows that 
(7.17) d/Az = (8° + 3)N/20(8 + 1)”. 


Table III contains some of the designs of this series. The table was obtained 
by substituting for 8 in (7.14) a specific value and solving the cubic equation. 
Only the range 8 > 2.691376 need be considered. The values given for z, y, z 
and a are those to be used when m = 0, i.e., when no center points are added; 
for mo center points these values must be multiplied by the factor a = [1 + 
(no/16)]*. The design points are obtained from (7.1) and (7.11) with appropriate 
values for x, y, z and a from the table. The value of 4/3 in the table is calculated 
from (7.17) when N = 16. For mp center points these values must be multiplied 
by a” = 1 + (n/16). 
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THE PROBABILITY IN THE EXTREME TAIL OF A CONVOLUTION! 


Davip BLaAcKWELL AND J. L. Hopaszs, Jr. 
University of California, Berkeley 


1. Summary. Let X,, X:, --- be independent and identically distributed 
random variables with possible values that are integers whose differences have 
g.c.d. one. Assume the m.g.f. of X; exists in an interval about 0, let a be any 
number such that E(X,) < a < sup X;, and let ¢(a, t) = Ee'*'™. There 
exists a unique value /*(a) of ¢ which minimizes ¢(a, t) with respect to t; write 
m(a) = ¢fa, t*(a)] andz = & “. Let Y:, Y2, «-+ be independent and identi- 
cally distributed random variables such that Y; and X, have the same range 
and Pr(¥,; = x) = Pr(X; = z)-e"®*”/m(a), and let uw. = o° ws, wy be 
central moments of Y; . 

We show that Pri X, + --- + X, = na} = |m(a)}" Pr{¥,+ --- + Y, = na}, 
and use this to establish the approximation Pr/X, + --- + X, = na} = 
x. (1 + O(n’)], where na is a possible value of X; + --- + X, and 


ot = Ok fi4 2 (43-38 
*  oV2an Sn \ui iB wd 


Similarly we find that Pr{X, + --- + X, = na} = 1,**[{1 + 0(n™”’)], where 


of? on ee om Sy f can + | + 2(1 + or — aT}. 
) 


z\ 2n 
We provide some numerical illustrations of the accuracy of these approximations, 
. . . x* . . 
and give a conjectured analog of the leading term of II, for nonlattice variables. 











(1 — Z)pe 


2. Introduction. Let X, , X:, --- be independent identically distributed 
random variables whose common moment generating function He‘? is finite in 
some interval about 0, and let a be any number such that E(X,) < a < supX,. 
We shall be interested in the tail probability 


Ti,(a) = PriX, + --- + X, 2 na}. 


As n — ~ we shall of course have II,(a) — 0, since na exceeds the expected 
value of the sum by about ~/n standard deviations. The study of the speed 
with which II,(a) — 0 was initiated by Cramér [2] in 1938; his results were 
extended by Feller [3] and Chernoff [1]. 

Denote by ¢(a, t) the moment generating function of X,; — a: ¢(a,t) = 
Ee'**™. Chernoff shows that for each a there is a unique value of t, say t*(a), 


Received September 26, 1958; revised August 3, 1959. 

1 This paper was prepared with the partial support of the Office of Naval Research (Nonr- 
222-53) and (Nonr-222-43). This paper in whole or in part may be reproduced for any pur- 
pose of the United States Government. 
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for which ¢ achieves its minimum, and writes ¢[a, ‘*(a)] = m(a). He shows that. 
for every « > 0, 


[m(a) — «]" S 11,(a) S [m(a)]" 


for sufficiently large n, with the right inequality holding for all n. 

This result establishes in a sense the speed with which II,(a) — 0, but it is 
not precise enough to permit the approximation of II,(a) with a small relative 
error, since the ratio of upper to lower bound tends to infinity with n. There 
remains the problem of developing a relatively accurate approximation for 
II,(a). Cramér [2] has found such an approximation for the case in which X, 
has an absolutely continuous component. We are interested in the case of lattice 
variables, i.e., the case in which there are constants A ~ 0 and B such that 
AX, + B has only integer values. 


3. An identity. In this section we restrict attention to sequences {X,} of dis- 
crete variables. 

TueoreM 1: Let X,, X2, --- be independent identically distributed discrete 
variables whose common moment generating function E(e'*') is finite for some 
interval about 0. For any a with E(X,) < a < sup Xj, let 


m(a) = min Ee'**” = min ¢(t, a) = ¢[t*(a), al, say, 

t t 
and let Y;, Y2, --+ be independent identically distributed discrete variables whose 
common distribution is defined by 


Pr{Y; = x} = Pr{X, = 2} exp [f*(a)(z — a)]/m(a) forall x. 
Then for all n, 
PriX, + --- + X, = na} = [m(a)]" Prf ¥, + --- + Y, = na}. 


The shift from the random variable X to the random variable Y has the 
effect of moving our event from the extreme tail to the center, since na is just 
the expected value of Y; + --- + Y,. This shift is not new. It is essentially 
carried out in Cramér’s original paper. Wald [6] made a similar change in his 
“conjugate” distribution, introduced in the study of a problem arising in se- 
quential analysis. Shannon [4] encountered the shift in a problem of information 
theory, and remarked (p. 15): “These tilted probabilities are convenient in 
evaluating the ‘tails’ of distribution that are sums of other distributions.”’ 

Proor or THEOREM 1: As noted by Chernoff, ¢(t, a) is for each a a strictly 
convex function of ¢ and attains its minimum at a unique ¢ = ?f*(a). Write 
p(x) = Pr{X, = xz}. We have ¢(a,t) = >>. p(x)e'*™, so that 


(1) ¢[a, t*(a)] = 7 (x aoa a)p(x)e"* a 0, 


where ¢; denotes the partial derivative of @ with respect to its ith argument. 
Write q(x) = p(x)e"* /m(a). Then q(x) is a discrete probability distribu- 
tion, and (1) asserts that the mean of the q distribution is a. Let Y1, Ye, --- 
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be a sequence of independent identically distributed variables with common 


distribution g, and let x, --- , 2, be any sequence of numbers whose sum is 
na. Then 


Pri(¥i, ++, Yn) = (t1, +++, %m)} = g(t) -*+ Q(tn) 


P(t) +++ p(tn) exp [f*(a)(ai + +++ + 2, — na))/[m(a)]” 
> Pr{ (Xi, _ $s Xn) (a ae? tn)}/[m(a)]”. 
Summing over all sequences (x, , --- , 2.) such that 2, + --- + 2, = na yields 
the assertion of the theorem. 

Theorem 1 extends to M-dimcasional variables without change. We shall not 
use this extension, but give it for completeness. 

THEorEM 2: Let X,, X2, --- be independent identically distributed discrete 
M-dimensional variables, and let a be any interior point of the convex hull of the 
range of X,, a * wu, where p = E(X,). Suppose that there is a positive number b 
such that the moment generating function Ee“* is finite for all t for which |t| < b 
and t-(a — w) 2 O where, for anyt = (hh, ++: , tm), 2 = (M1, +++, Um), tz 
denotes the inner product > ta; . Then the moment generating function of X; — a 
achieves its minimum value m(a), say, atauniquet = t*(a), say,and,if Y; ,Y2,--- 
are independent identically distributed discrete M-dimensional variables whose com- 
mon distribution is defined by Pr{Y; = x} = Pr{X, = xje“’*/m(a), then, Y; 
has mean a and, for all n, 


Pr{X, + «+: + X, = na} = [m(a)]" Pr{¥i + --- + Y, = na}. 


The proof parallels that of Theorem 1. Again, the moment generating function 
has a minimum m(a) at a unique ¢*, at which d¢@ | dt, = 0 for all 7. These equa- 
tions assert that the q distribution defined by g(x) = Pr(X; = z)e“"*”/m(a) 
has mean a, and the rest of the proof is as before. 


4. The individual term. In this section we shall specialize to the case of lattice 
variables. This means that it is possible by a linear transformation to assure 
that the values of X, are integers whose differences have g.c.d. 1; we assume 
this reduction has been carried out. We are then able to develop expressions for 
m,(@), using a method exploited for example by von Mises [5 Sec. 8]. 

Let o° = ue, us, ws be central moments of Y of order 2, 3, 4. We shall estab- 
lish 

THEOREM 3: If X;, , X2, «++ are integer-valued variables satisfying the hy- 
potheses of Theorem 1, the approximation 


[m(a)} 

V2rne 

for r,(a) = PriX; + --- + X, = na} has relative error of order n™', while the 
approximation 


r,.(a) = 


f ~ 3 
. 1 
(a) = ra(a)) 1+ 5 | -—3- 5a 
2 ? 


e —2 
for x,(a) has relative error of order n“, 
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Proor: In general, if a random variable U with characteristic function 7 has 
only integral values it is easy to check [5] that 


Pr(U = u) = (1/2) e *™n(t) dt. 


-7F 


Since Y; + --- + Y, is such a random variable, we have 
Pr(Y; + --- + Y, = na) = (1/2x) / e "™*e"(t) dt 


where {(¢) is the characteristic function of Y and na is an integer. Finally, if we 
write y(t) = e “‘t(t) for the characteristic function of Y — a, we have 
Pr(Y¥; + «+» + ¥, = na) = (1/2r) f2, y(t) dt. 

To ssi this integral, let us first take it over the range | ¢| < log n/+/n. 
If we make the usual expansion of log y(t) in terms of the cumulants «x, of Y; — a, 
observe x, = 0, and write x. = o°, we find 


not? 
n ——— r at _ 
y(t) =e exp n ae tet + o(n I 
r=3 
when |¢! < log n/+V/n. The transformation ~/n o t = u and series expansion 
of the second factor puts the integrand in the form 
matte | ix; u® L[fxsut gue P P, 7 
yD a + la — alt a tt O(n *y\ 
\ 6a V/n n | 24¢ 720 


ni/2 Ne 


(2 


over |u| S o log n, where P, denotes a polynomial in u’. Using the fact that 


o log n p+1 
(3) | Pe a oe OF TP (24) ‘) + o(n”) 
—oc log n 
when p is even, and vanishes when p is odd, we find 
log n/\/n ( ‘ ) 
(4) = v(t) dt = — se t ign i -3- | + o(n)> 
<7 —log nijy n ov 2rn Sn Sus ; 


where we have expressed the cumulants in terms of the central moments uz, . 
Turning now to the range log n/+/n S |t| S =, we shall show that this 

part of the integral is negligible. Since x. = 0 and 0 < o° < , we can find 

0 < t& < mw such that | y(t) | S 1 — (o'f’/3) for |t| < t&. Therefore, over the 


range log n/+/n S |t| S bo, 


[vat <2 / rr ae. 
log n/V/n 


which is o(n™“) for all k. As for & S |t| S z, note first that our assumption 
that the possible values of X, are integers whose differences have g.c.d. 1 im- 
plies that, when 0 < |t| < x, the points e’” can never all coincide, and hence 
that >-, q(x)e"” lies inside the unit circle. Therefore 

¥(t)| = |e" 7: q(xje"*| <1 for t& S |t 


z 


IA 
> 
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and by the continuity of y there is a number p < 1 for which | p(t) | < p in 
this range, over which f y(t) dt is o(p"). We may therefore take the right side 
of (4) as an expression for (1/27) {[", y"(t) dt, and hence for 


Pr( Yi; + --- + Yan = na). 


This fact, combined with Theorem 1, proves Theorem 3. 

We present in Table 1 a few illustrations of the accuracy of the two approxi- 
mations. Here, by the relative error of an approximation 7’ for a quantity + we 
mean (2’/3) — 1. The values of X are 0, 1, ---,r — 1. 


5. The tail probability. An extension of the methods used above provides 
expressions for the tail probability. We are indebted to D. A. Darling for suz- 
gestions which led to this result. 

THEOREM 4: Jf X;, X2, «++ are integer-valued random variables satisfying the 
hypotheses of Theorem 1, then the approximation 


x(a) = wa(a)/(1 — z) 


for Tn(a) = Pr{X, + --- + X, = na} has relative error of order n™, while the 
approximation 


+7) _ * _ 1 | Cems/me) + 21 + 2)/(1 — 2) 
ll, (a) = 11,(a) {1 ve | (oalen) + ath aM — 2) 


for T1,(a) has relative error of order a. 


Proor: An easy modification of Theorem 1 shows that, for any integer k, 
a(k) = Pr(Xi+ --- +X, = na+ k) 
= [m(a)]" e*" Pr(¥1 + --- + Y, = na +k) 


TABLE 1 


Relative error of 





040542600 

- 0760067293 
.0°18201692 
-0°23350658 
. 0754869684 
084094675 
.0726832893 
-0°79040527 
0924029752 
-0°30784454 
.021789551 

.0?23290971 
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while the proof of Theorem 3 gives 


Sat t, ck «-5.dee ee) Be a / e**y"(4) dt. 
Qa -?t 


Summation over k now gives 
eR titte") 


Ko kemd Ko 1-e 


Because of the boundedness of the integrand, we may pass to the limit inside 
the integral to get 


(5) n, = Im(a)l ‘ wii stint ACG) Bis 


2 , 1 _ as* 





wherez = €° <1. 


The evaluation of this integral is much like that in the proof of Theorem 3. 
Since 1/(1 — ze~“) is bounded, the integral over | t| = log n/+/n is negligible 
as before. As before, we substitute ./n ot = u, and find that when | u| < o log n, 


1 1 izu 21 + z)v uP; 


Ps -2 
l-—-z* 1-2 of —2z)sm 291i — 2)n + nil? + n ty A 





where the P; again denote polynomials in u’. Combining this with (2), and 
integrating the various terms with the aid of (3), we find 


pow 1 _ wn _ V2e 1 Ma - | 
Caroa- M+ Z[4 . 3u3 


2 





_ 2 w(l — z) + o°(1 + 2) 


2n ol — 2)? 








+ of ny} ; 


This, combined with (5), yields Theorem 4. 
We present in Table 2 a few illustrations of the accuracy of the approxima- 
tions. As in Table 1, the values of X are 0,1, ---,r — 1. 














TABLE 2 
Relative error of 
’ ! ? . | : Mn eau . s ss 
| On On 
ee (A aliases ae 
3 1,4,4 1 | 8 | .064471879 | 0.148 — .0888 
16 | .0°99484233 0.0846 — .0289 
32 .0°30990382 0.0465 — .00861 
3 We 3 8 011276245 0.0862 — .0259 
16 .0°35039405 0.0474 — .0°733 
4 2, 25 24.2 rs 8 .040328979 0.134 — .0705 


16 .0°44785112 0.0757 — .0224 
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6. The nonlattice case. For nonlattice variables, let us heuristically treat 
r.(a) = m"(a)/o(a) ~/2en as an approximation to the (in general nonex- 
istent) density of X, + --- + X, at the point na, and proceed formally. 


Pr{ X; + = ct = ae [. “(a+ ‘) dx 
SE r*(a) “Fales ag dx 


m\ a += dx 
rel (a) [ af 


am 
2 
x(a) | exp [xm’(a)/m(a)} dx 
0 


ase) [ exp [—zt*(a)] dz = #*(a)/t*(a). 
0 


a 
° 
t+ 
© 
a 
© 
a 
x 
+ 
+ 
<x 
~ 
e 
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We thus obtain the approximation 
Tn(a) = 4a(a)/t*(a). 


We conjecture that this approximation has a relative error which is O(n™'), 
just as the corresponding approximation did in the lattice case. For variables 
with an absolutely continuous component, II%(a) is just the leading term in the 
expansion obtained by Cramér [2] and is thus known to be correct. The conjec- 
ture is supported by numerical evidence for the case in which X has values 0, 1, 
and 4/2 with equal probabilities. We have computed a portion of the tail of 
this distribution for n = 64, which is shown in Fig. 1 with the approximation 
superimposed. 
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BOUNDS ON NORMAL APPROXIMATIONS TO STUDENT’S 
AND THE CHI-SQUARE DISTRIBUTIONS' 


By Davw L. WaLiLAcE 


University of Chicago 


1. Summary. Formulas closely related to 
u(t) = [n log (1 + #/n)] 


w(x’) = [x* — n — n log (x’/n)}* 


are considered for converting upper tail values of Student’s ¢ or chi-square 
variates with n degrees of freedom to normal deviates. The chief object of the 
paper is to construct bounds on the deviation from the exact normal deviates 
such that the absolute deviation is bounded by cn™ uniformly in the entire tail. 
Two approximations for Student’s ¢ are suggested that are remarkably accurate 
and an improvement over other available approximations. The bounds and ap- 
proximations for Student’s ¢ are given in Section 3 and those for chi-square in 
Section 4. Some of the methods used in obtaining bounds may be of value in other 
investigations. These are given in Section 2. 

The development of the bounds was stimulated by the work of Teichroew [3]. 
He obtains expansions for the normal deviates corresponding to tail values of 
Student’s ¢ and chi-square and achieves spectacular accuracy even for small n. 
The idea and the construction of the expansion is set forth, briefly, in [4], p. 647. 
The first terms of these expansions are the u(t) and w(x’) used here. The bounds 
of Theorems 3.1 and 4.2 show that these first approximations are correct to 
O(n) uniformly forall ¢ > 0 or x’ > n. This fact can be used to show that the 
Teichroew expansions are valid asymptotic expansions. 


2. Some results useful for obtaining bounds. Let F be an arbitrary, absolutely 
continuous distribution function with density function f, let 6, @ be respectively 
the unit normal distribution and density functions, and let x(t) be the root of 
(x) = F(t) (i.e. x(t) is the normal deviate corresponding to the argument ¢ 
of F). The problem considered is that of finding bounds on x(t) for a given F. 
Any numerical bound on F(t) can be converted numerically to a bound on x(t). 
Frequently, though, a simple analytic expression for the bound is useful. An in- 
equality F(t) < (2(t)) yields directly the bound x(t) S z(t). Two simple suf- 
ficient conditions for such inequalities are given in Theorems 2.1 and 2.2. 

Often however, only a weaker inequality 1 — F(t) S c{l — (2(t))] can be 
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obtained in which c is typically slightly greater than one (and may depend on ¢). 
A simple bound on z(t) can still be obtained analytically, although it is not as 
strong as the one that could be obtained numerically. The bounds are obtained 
by using the normal tail inequalities and become relatively stronger as z(t) 
increases. For two directions of inequality, results are given in Theorems 2.3 
and 2.4. 

Assume throughout that the density f(t) is positive and continuous fora <t < « 
and that any approximation z(t) to x(t) is a continuously differentiable, strictly 
increasing function fora <t < ©. (ais any appropriately chosen constant, which 
can be — « but need not be the lower boundary of the domain of f.) 

Denote by g the function 


o(2(t) )z’(t) 
(2.1) g(t) 0) ; 

THEOREM 2.1. If 

(a) limp. 2(t) = ~ 

(b) limy.. F(t) = ®(limz.. 2(t)) 

(ce) sgn [g(t) — 1] is a monotonic function of t fora < t < ~, then x(t) = z(t) 
or x(t) S 2(t) for alla < t < & according as the function in (c) is increasing or 
decreasing. 

THEOREM 2.2. If g(t) 2 1/e (Ss) foralla <t < «, then 


1 — F(t) S e{l — O(2(t))] (2) foralla <t < @. 


If c = 1, then x(t) = 2(t) (S). 
Proor. Let 5(t) = F(t) — ®(2(t)) 


5(t) = f: o(u) du — [ 1) ds. 


In the first integral, make the substitution u = z(s), so that 
5(t) -{ f(s)[g(s) — 1] ds. 


By (a) and (b), (a) = 0 = 6(@) and if, by (c), sgn [g(s) — 1] is, say, increas- 
ing in s, then 6(t) < O and ®(z(t)) = F(t) = &(2(t)) and z(t) = x(t) for all 
a<t< ,. Theorem 2.2 follows directly from 6(t) = [1 — F(t)}(1 — c)/e(S). 
Both theorems clearly hold if ® is any distribution function with continuous 
positive density on the entire real line. 
THEOREM 2.3. If, for some value of t such that z(t) > 0, F(t) satisfies an in- 
equality 


(2.2) 1 — F(t) S a[l — (a(t) )] 


with c, = 1, and if, in addition either (a) x(t) > —z2,(t) or (b) [1 — O(a (t))] s 


1/(1 + c) holds, then 
—1 


x(t) = a(t) — — 
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THEOREM 2.4. If, for some value of t, such that z(t) > 0, F(t) satisfies an in- 
equality 


(2.3) 1 F(t) 2 afl — &(a(t))] 
with 0 < @ <1, then 


1—c 1 
a(t) S w(t) + ae ae: 

If c, S 1 in (2.2) it may be replaced by 1 and the bound z(t) 2 z(t) used. 
C2 2 1 in (2.3) can be handled similarly. Results taking advantage of these con- 
stants can be obtained but are rather poor. 

Proors. By definition of x(t), 1 — F(t) = 1 — &(2(t)). Henceforth the argu- 
ment ¢ in x(t) and z(t) will be dropped. The proofs use the Taylor expansion 


(z) = (x) + (z = x)p( 62 + qi sad 6)zx), 0 SOs 1, 


and the normal tail inequality 


(2.4) 1 — ou) < ©, u>o. 
Inequality (2.2) and condition (b) of Theorem 2.3 together imply condition 
(a) since 1 — (x) S a[l — O(z)] S a/(1 + 4) sothatd(z) 2 1/(1 +a) 2 
1 — &(z,) = &(—z,) and hence z = —z. 
Eliminating (x) between inequality (2.2) and the Taylor expansion and 
solving for z gives 


(co — 1){1 — &(a)] 
(2.5) +S % — “SG + doa) 


Let c,; 2 1 and assume first that z < ~ so that, with condition (a), |2| S 
and hence ¢(6z, + (1 — @)x) = (a). Using this and inequality (2.4) in in- 
equality (2.5), 2 2 z= — (eq — 1)/z,. But this holds trivially when x 2 2 , so 
that Theorem 2.3 is proved. 


Eliminating #(z) between inequality (2.3) and the Taylor expansion and solv- 
ing for x gives 


(l-—@) (1— &(z)) 
Ce (O22 + (1 — @)x) © 


Let 0 < ce. S 1, and assume first that x = z. . Then ¢(6z. + (1 — @)x) 2 o(2) 
and with inequality (2.4), 


(2.6) rSam+ 


(1 =O. 


zSz22+ S2 


(1 — e) 
C2 | x | eve C2 Ze 


These inequalities hold trivially when x S z, and Theorem 2.4 is proved. 
Let {F,,(t)} be a sequence of distribution functions and {z,(t)} the correspond- 
ing normal deviates. An approximate normal deviate z,(¢) which is a close ap- 


proximatio.. to z,(t) in the entire tail of the distribution would often be useful. 





1124 DAVID L. WALLACE 


The results of this section enable detailed boundings of the errors of such ap- 
proximations from the corresponding distribution function approximations. The 
essential qualitative result is that the absolute deviate error will be of order 
p(n) throughout the entire tail if the per cent error (relative to smaller tail) in 
the distribution function approximation is of order p(n) throughout the tail. 
The result is not quite necessary. 


3. Normal approximations to Student’s ¢ distribution. Let F,, be the distribu- 
tion function of Student’s ¢ on n degrees of freedom. 


2 2 n+1 
1 — F,(t) = a,(2n)* [ (1 + =) 2 ds, 





an 


Ce) (2). 


n 


Denote by z,(¢) the normal deviate corresponding to the deviate ¢ of Student’s 
distribution. Chu [1] has studied the normal approximation #(t) of F(t). He 
was not concerned with approximations in the extreme tails of the distribution nor 
with quantile approximations; but methods similar to his can be used. 
Bounds on the deviate z,(¢) are given by 
THEOREM 3.1. For all t > 0 and with u(t) = [n log (1 + ¢/n)]' and k = .368, 


(a) a(t) S u(t), n> 0; 
(b) a.(t) 2 u(t)(1 — (1/2n))! = u(t), n> 50; 
(c) t,(t) 2 u(t) — k/nt = u3(t), n= .5O. 


Coro.iary. Inequality (b) can be written as 


(b’) x,(t) = u(t)(1 — b,/n)), n =m > .50, 
with bk = nfl — (1 — 1 /2no)'). Three numerical values of b, which will suffice 
for almost all uses are: % = 1, hk = .293; no = 3, bk = .262; no = 10, db: = .254. 


The bounds show that u(t), as an approximation to z,(¢), has an absolute error 
not exceeding .368n7 and a relative error (relative to u(t)) not exceeding b,/n. 
Except for very large values of ¢, the bound (c) is much poorer than the bound 
(b). The main interest in (c) is the rather remarkable fact that even as ¢ and 
Zn(t) increase indefinitely the error remains bounded and even of order n. An 
interesting theoretical ‘application will be noted in Section 6. 

The derivations of the bounds and a few calculations suggest the following 
conjectures on the behavior of x,(t): that x,(t)/ue(t) — 1 as t > 0, and that 
u(t) — 2,(t) as a function of ¢, increases monotonically to a maximum value 
slightly less than .368n~* and then decreases monotonically to zero, the maximum 
occurring for ¢ and n for which u*(t)/n is substantial. 

Calculations indicate that the error, u(t) — 2,(¢), is close to its maximum value 
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unless ¢ is very large, so that the maximum of the two bounds (b) and (c) isa 
good approximation. Two superior approximations that were obtained em- 
pirically are approximations u,(t) and us(t), 


wit) = u(y (+R), 
2Qu(t) (1 —«**)' 


us(t) = u(t) — in + 3 


_ 368(8n + 3) 


2+/n u(t) — 


For alln > 1 and allt > 0, 
Uo(t) < u(t) < u(t), 
max (U2(t), us(t)) < us(t) < u(t). 


us was chosen as slightly larger than us to give a good fit for small ¢’/n. us was 
constructed to be larger than u, and us and to so join them as to give excellent 
approximation over a wide range of values. Though the function is somewhat 
complicated, it is amenable to slide rule calculation. wu, seems to be within .02 of 
z for t’/n less than about 5 and us within .02 of x for a much wider range. 

The bounds u(t), w(t), us(t), the approximations w(t), u(t), and the ap- 
proximation u(t) obtained from the Paulson approximation [2] to F are illus- 
trated in Table 1 for n = 1, 3, 10, and selected values of ¢. The Paulson approx- 
imation gives a normal deviate corresponding to the double tail ¢ probability 
and hence has to be converted to be comparable. 


9 9 
lz f’* ad 4 
K,(t) = fee] of) 1 = &(u(t)) = 50 — (K,(0)) 
_ 3 + | 


On 9 


Polynomial approximations such as the Hotelling-Frankel approximations, are 
very poor for small n or for very large t. 

All bounds and approximations except us(¢) can easily be inverted analytically 
to give bounds or approximations for the Student’s deviate corresponding to a 
given normal deviate, i.e., for the quantiles of ¢. 

The proof of the theorem will be preceded by two lemmas. 

Lemna 1. For all x > 0,h.(x) = (e* — 1)/xe™ is monotone decreasing for c = 1, 
monotone increasing for c = 4 and not monotonic for 4 < ¢ < 1. 

Proor. h.(x) = (1/zx*e*)[xe* — (e* — 1)(cx + 1)] and is = 0 or S 0 accord- 
ing as xe"/((e” — 1)(ex + 1)) is = 1 or S 1. The result follows from a termwise 
comparison of the Maclaurin expansions of the numerator and denominator. 
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TABLE 1 
Bounds on the normal deviate x,(t) for Student’s distribution 
1 — &(x,(t)) = 1 — F,(t) 

















Bounds from Theorem 3.1 Approximation 
n t Exact xn(é) | ——- -—-—— —- —____________ 

Upper u(@) | Lower u(#) | Lower us(é) tu (8) tus (0) tie () 
1 0.3 .235 .294 - 208 <0 .241 241 . 257 
1 .674 -832 . 589 -465 -680 -681 .674 
2 1.047 1.269 .897 -901 1.038 1.048 1.031 
4 1.419 1.683 1.190 1.315 1.377 1.416 1.349 
8 1.756 2.043 1.445 1.675 | 1.672 1.750 1.576 
12 1.935 2.231 1.577 1.863 1.825 1.927 1.670 
10? 2.729 3.035 2.146 2.667 | 2.177 2.704 1.896 
105 4.514 4.799 3.393 4.431 3.926 4.447 1.964 
3 1 .858 .929 .848 .717 . 860 . 860 .855 
2 1.478 1.594 1.455 1.382 1.476 1.478 1.477 
4 2.197 2.353 2.148 2.141 2.179 2.197 2.160 

8 2.872 3.053 2.787 2.840 | 2.826 2.879 2.71 
12 3.228 3.417 3.119 3.204 | 3.164 3.237 2.935 
V3 xX 10 5.057 5.256 4.797 5.044 | 4.866 5.058 3.493 
10 1 -952 .976 .952 .860 953 -953 .948 
2 1.790 1.834 1.788 1.718 1.790 1.790 1.805 
4 3.021 3.091 3.013 2.975 | 3.017 3.020 3.014 
8 4.382 4.474 4.361 4.357 | 4.366 4.384 4.279 
12 5.128 3.229 5.097 5.113 | 5.103 5.133 4.902 


100 100 21.447 | 21.483 | 21.429 21.446 | 21.429 | 21.450 | 18.541 


Lemma 2. For all x > 0, ((e* — 1)e"**)/ze* = 1 with k = .368. 
Proor. The desired inequality is equivalent to the inequality 
QO(rz) =e -1-—- ze” ** > 9. 
Let T be defined by 
Q'(x) = fe pot Se bi ker? +2)\ = gtr z), 
The simultaneous equations in z and k: Q(x) = Oand T(x) = 0 will have exactly 
one solution with positive x and the root for k is (to three decimals) the smallest 


value for which the inequality Q(z) 2 0 holds for all z > 0. The solution is k = 
.368 and x = 7.312. 

Proor or THEOREM. Proceeding as in Theorem 2.1, set z(t) = Au(t) — pw 
with u(t) = [n log (1 + ¢’/n)]' and with X, u constants to be chosen. Then form 
the function g(t) = o(2(t))z’(t)/fa(t) written as a function of « = u’/n, which is 
monotonic in ?, 


‘ 4s es ne (é - 1)e" (nz) 
g(t) = h(x) = aoa 


where c = 1 — n(1 — 2’). 
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First, set » = 0. Then z(t) = Au(t) satisfies conditions (a), (b) of Theorem 2.1. 
Monotony of g(t) and hence condition (c) of Theorem 2.1 follow from Lemma 
1: decreasing for c = 1 or A = 1, increasing force = 4 ord = (1 — (1/2n))*. 
Conclusions (a) and (b) follow. 

Next set \ = 1 and up = k/n' with k = .368. Then, using Lemma 2, g(t) 2 1 
for all t > 0, provided that a,e""" < 1. Hence (c) follows from Theorem 2. 

The proof that a,e"'"" < 1 for alln = .50 and that (1 — (1/2n))'>1—b/n 


from which the Corollary follows are given in Section 5. 


4. Normal approximations to the chi-square distribution. Let F,, be the dis- 
tribution function of chi-square on n degrees of freedom (using ¢ instead of x* 
as argument ). 


1 7-1 _ si 
s eds. 


Denote by y,(t) the normal deviate corresponding to the chi-square argument ¢. 
Only the upper tail with ¢ > n is treated in this paper. Bounds on 1 — F,(t) and 
yn(t) are given by 

THEOREM 4.1. For allt > n, alln > 0, and with w(t) = [t — n — n log (t/n)]', 
and w(t) = w(t) + 3(2/n)? 


1 —F,(t) = 


(a) 1 — F,(t) > d,e?"[1 — (w(t) )] 
(b) 1 — F,(t) < d,{1 — ®(w(t))] 


n—1 n 
“ye - 4 
(5 e* (2x) 
i dicanae = 


in which 


T'(n/2) 
THEOREM 4.2. For all t > n, 
(a) yn(t) S we(t) + (1/we(t)) max [0, dz'e**" — 1), n> 0, 
(b) ya(t) 2 w(t), %.> ade 


Coro.uary 1. Inequality (a) can be written as 


(a’) yn(t) s w(t) + be/nwe(t), n = no > 0, 

with be = no(e/*"° — 1). Numerical values of b: which will suffice for almost all 

uses are: mo = .37, be = .060; m = 1, be = .058, m = 10, be = .056. 
Coro.uary 2. For all t > n and all n > 37, 


w(t) S yn(t) < w(t) + .60n’. 


The bounds on y,(t) are illustrated in Table 2 for n = 8 and selected values of 
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TABLE 2 
Bounds on the normal deviate y,(t) for the chi-square distribution 
1 — &(y,(t)) = 1 — F(t), n=8 


Bounds from Theorem 4.2 











| 
‘ ae Exact yn(#) labled . | a 
Upper (a) Lower (5) 
12 1 1.031 1.095 .869 1.035 
16 2 1.724 1.769 1.566 | 1.726 
20 3 2.314 2.354 2.160 2.310 
24 4 2.835 2.874 2.685 2.820 
32 6 3.737 3.776 3.593 3.691 
40 8 4.512 4.553 4.373 4.427 
72 16 6.940 6.989 6.813 6.647 


t. Shown are bounds (a) and (b), the exact normal deviate y,(¢) and the Wilson- 
Hilferty [6] approximate deviate 


2 


Qn 


es 
(i) 


The Wilson-Hilferty approximation is much superior to the bounds as approx- 
imations except in the extreme tail and the chief value of the approximation is 
the uniform bound of order n™ on the error in the tail. 

The proof of the theorem will be preceded by a lemma. 

Lemma 3. For all x > 0, 


(t/n)"* —1 4+ 
w(t) = j 


(a) A(x) <a 
(b) eX (x) >z 


with (x) = 2'[x — log (1 + x)]’. 

Inequality (a) follows immediately from log (1 + x) > 2 — 2°/2. It cannot 
be improved by any factor of the form exp (kA(z)). 

Inequality (b) is sharp for small x and the coefficient in the exponent cannot 
be decreased. Let 


a 


Denote derivatives with respect to x by primes. The proof consists in showing that 
yi > %and y2 < 3 for all « > 0, from which it follows that y2 < 1 + 27/3 < y 
and hence, inequality (b): y} > yi. 


een ywez/uy w= (z). 








2/3 — ys 2, 3 4 2 a(z) 

2/: — Y= a] M “Te STS _"" 
x} , 3¢(x-3 

B(x) = | tog (1 + 2) 4 Tee. 
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Hence 8(x) > 0 for all x 2 3. 


6'(x) = a3 — 2) E (l1+2)—- “| + 3z [x' — 3] : 


l+2 2(1 + x)? 
Let 
(1+ Cts) 32° 
(z) = 23 — x) 5 (x) = log (l+2) - + Si +0 >) : 


2(5 — x) 
(1 + z)*(3 — x)? ° 


Hence y'(x) > Oforall0 < x < 5.y7(0) = Oso thaty(x) > Ofor all 0 < x < 5. 
Then 8’(x) > O for all O < x < 3. Since 8(0) = 0, B(x) > O for 0 < x < 3, 
which, combined with the result for = 3 gives Y2 < ¥and y < 1 + 22/3 for 
all z > 0. 

Let 


(2) = 


3n°(1 + 2)? yi 
2x1 ; 
Then 6(2) = X” — 2° + (%)a°X = (1 — yo) + (¥)2°A. Using tne inequalities 
y2 < 1+ 22/3 and S z gives 5(x) > 0 for all x > O. Since y;(0) = 3 and 
yi(0) = 1, the desired result y, > 1 + 22/3 for x > 0 follows immediately and 
inequality (b) is proved. 
Proor oF THEOREM. Set 2(t) = w(t) + c(2/n)' and form the function g(t) 


i(x) = 





of (2.1), written as a function of zx = (t — n)/n, then 
: ‘ re oO 
t) = ns 
g(t) XG) 
with A(x) = [2(a — log (1 + x))}-. Using Lemma 3 and Theorem 2.2, with c 


equal to 0 and 4, Theorem 4.1 follows. 

The first part of Theorem 4.2 follows using Theorem 2.4 and the second part 
from the fact, proved in Section 5, that d, < 1 for n 2 .37. Corollary 1 follows 
from the fact, proved in Section 5, that e "°"'d;' < 1+ bo/nforn = m > 0. 
Corollary 2 follows from Corollary 1 and the theorem: 


5. Bounding of some simple functions. In this section four results, used in 
Sections 3 and 4, are derived. Specifically, with 


n+1 
(3) (2 
.- Mey, 
() 
ue 1 : te 1/18n9 
bi = % l1-— 1-— be = mo [e — lj, 
2no 


(5.1) An dMi*< hy n= .50; 





T'(n/2) 
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i\ b 
(5.2) (i - +) “ies. n= m% > 50; 
2n n 
(5.3) Giant; n= 37; 
on be 
(5.4) ed, S1+-, n=>nm> 0. 


An easily proved result that is used repeatedly (with z = 1/n) is the following: 

Lemma 4. If f(x) has a uniformly convergent Maclaurin series for0 S x S x 
and if all derivatives of f(x) at x = 0 of order greater than m are of constant sign, 
say positive, then for all0 S x S xX, 


Tn(x) S f(x) S Tna(x) + 2” [ Mood — Fons (a) | 
0 
where T,,(x) is the partial sum through order m of the Maclaurin series. (If sign 
is negative, the direction of the inequalities is reversed. ) 
(5.2) is a direct application of Lemma 4. 
The Stirling expansion with argument n/2 is just the expansion of —log d, 
and the first two partial sums bracket the value ((5], p. 253). 


1 1 1 
5. — — ——. S$ — nana) , 
(5.5) on ~ Ibn = logd, =< in n>0O 
By the duplication formula for the gamma function, a, = d>,/d2, so that, 
1 1 1 2 
5.6 diag ow aloes ae , 
(56) in 300n) = °S% & — 7, + aoe ot te 


Using (5.5), it follows that log d, < 0 and hence (5.3) for n? = 2/15 orn = .37. 
Also, for all n > 0, 
Sor Ce 1 < glee a 1 


and (5.4) follows by application of Lemma 4. 
From (5.6) it follows that 


k2/2n 1 1 kK 2 
a -{--+ -+ —. ; 
“eS? aw |} ( 4 2 aa) 


The exponent is negative if n 2 .494 proving (5.1). 
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VARIANCE OVER SMALL INTERVALS OF A 
CONTINUOUS DISTRIBUTION 


By GuNNAR EKMAN 
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1. Summary. Approximate expressions of more or less simple analytical form 
are derived for the conditional mean and variance over small intervals of a 
distribution having a probability density of somewhat restricted nature. An 
alternative formula for the mean is derived. This result is applied to an extremal 
problem in stratified sampling. 


2. Derivation of results. Consider a positive function f(t), defined on some 
finite interval containing points z and y, and having continuous derivatives to 
the fourth order. Let us define functions 


(1) Ii{y,x) = [ (x — t)'f(t) dt, fs% a b. 


These functions exist and may be partially integrated, whereby, using the mean 
value theorem for integrals and writing (x — y) = k for ease of notation, th. 
following identities are obtained: 


2 3 4 5 
Io(y, x) = +s’ + aif + 5S" + 5S, 


(2) Ih(y, x) 


+8 ap ee y. +5 i +7 rf”. 
k kt 
mores tent es Oe eh pea 


where all f” but f® are taken at the point y, f being taken at points 6; within 
(xz, y). Let us now define three new functions by 


Hy(y, 2) = ity 2) s{1 x F f+ 5 (oS ee *) 











(3) To(y, x) 7% 
. gf” 3(f’ )’ _ 5f'f -)] c a, | 
+5 Gy + ge ~ Se) | +0 rT 
Hys) Bi, & fof 
(4) indi In(y,z) 3 {1 a FTS Gr PY) 


3 sn? (f ) f'f 
-#(F 4 Pr rL) + oe}, 
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= Hi(y, x) = Hay, 2) — [Hi(y,2)} = C{ltk 7a iy a a) 
4 wr (f’)* f'f” 4 \ 
ee (Gs 1a rapt) + OC Mf 


where we have imposed yet another condition on f(t), namely f(y) # 0. Further- 
more we have the Taylor expansion 
ff” 
| ar 


K(f”. Usy _ eT 4 
ie: f) - EE) + on), 


log f(x) — log f(y) = k- f 
(6) 








Using (6), we have from (3) and (5) 








s +5 f(z) —k* 
(7) | itv, 2) - § + 5 - log f=) 5 - Ri(y) + OCR), 
_# oft ote Kf") 
| Halu, 2) Bt (75 108 Fy) ) |- 360 fly) 
(8) ! 
+ 7 R:(y) + O(k), 
where 











AIO) MAY. wor] 
me a (f)* Go |’ 


_ pr) _ og” 0) 
rat) = [Fe “Go | 


_— the definition of Hi(y, x) and H;(y, x), by (1) and (3)-(5), we see that 
the functions [H,(y, x) — (k/2)] and H;(y, x) are symmetrical in z and y, so 
that the same may be said for the left hand members of (7) and (8). This 
implies that x and y may be interchanged in the right hand members of these 
identities without changing the order of magnitude of the terms. 

Now we see from (1) and (3)—(5), if u(y, x) and o’(y, x) denote the conditional 
mean and conditional variance respectively over (y, x) of the function f(t) con- 
sidered as a probability density, that we have 


Ay(y, x) —— B(y, x), A3(y, x) = o(y, x), 


so that we are led from (7) and (8) to the following approximations: 


(10) ulna) ~ SEE + oly 2) + 2=W. Ry), 


(9) 

















720 
: (x — (c—y)* f(y) 
a a (y, x) me EH [c(y, 2 i+ 360 Ty) 
+ (2 — y) . R2(y), 


720 
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where 


_ (4 — y) f(x) 

(12) c(y, 2) ——— . 108 Fy? 

and where R,(t), R.(t) are given by (9); 2 and y may be interchanged in (10) 
and (11). We note that by taking only values of f(t) at the end points (2, y) 
into consideration, that is, by neglecting all but the first two terms in (10) and 
(11), we may approximate » and o” correctly to O(z — y)*. We call attention to 
the fact that the logarithms are to the base e, and that the conditions imposed 
on f(t) are the following: 0 < f(x), f(y) < @, and the first four derivatives of 
f exist and are continuous; these conditions may be weakened and were imposed 
for ease of derivation. Finally, it may be remarked that approximations con- 
taining Jo(y, x) explicitly may be derived in the case f(x) or f(y) = 0. 


3. Further results. We shall derive another approximation to u(y, x), assuming 
the existence and continuity of f’ and f” and with f(x), f(y) # 0. By cubing 
both sides of (3) we find 


k° k f' 

3 

(13) [x — u(y, 2)) Fl -4.2 

On the other hand, using the Taylor expansion 
f(z) =f+k-fi + OF) 


together with (2), we obtain from (13) 
¥ fotos) ET atte | - sb 3 3 
(ye ae a ae [z — u(y, x)]* + OCR). 


Assuming « > y for definitiveness and writing Jo(y, x) = P(y, x) = the area 
under f(t) in (y, x), we have from (14) 


+ ou) | : 





i {Ply 2) + (2 — y)*\! 3 
(15) u(y,z) = 2 (Pun Gav) + O(k), 
from which, by permuting z and y, we obtain also 

; i. P(y, x) - (x — y)*\' 3 
(16) u(y,z) = y+ { Pins) — 9) + Ok). 


These approximations, (15) and (16), are less accurate than (10), even when 
the last term of the latter is neglected, but may be used to obtain an approximate 
solution to the following problem arising in the theory of stratified sampling with 
proportionate allocation, see [1]. Given a density f(z) over a range (20, Zn), 
(n — 1) variable points z;,7 = 1,--- , (n — 1), ri-41 < 2;, and denoting by 
P;, , un and o% the area, conditional mean and conditional variance in the interval 
(an_-1, Xr), We are to minimize 


(17) >, Pros . 
hel 
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In [1] it is shown that the points minimizing (17) satisfy 
(18) Th — Bh = Mati — Lr, h=1,---,(n — 1), 


and it is seen that the determination of such points may give rise to some com- 
putational difficulties. Let us now assume (2, — 2) finite, 0 < f(z) < ~, and 
that f’(x) and f”(x) exist and are continuous over the whole range. Taking 
y = %-, and x = 2 in (15) and y = x and x = 24, in (16), we see that if 
we neglect terms of order O(k’), the equations 


(fa — tn1) Pr = (Tayi — tn) Prot > h=1,-:-,(n-1), 
that is, 
(19) (Xp oT tn1) Pr ar K,, ’ h sad 1, ene 


where K,, is a constant dependent on f(x) and n, may be substituted for (18). 
This result may be compared with the approximate solution (2, — 2,1)P, = C, 
given in [2] to the similar problem of minimizing >>? Pyo, ; we see by (11) that 
(19), and (11) of [2], may be replaced by Poi = K,, and Pyo, = C., respec- 
tively, without affecting the degree of approximation, whereby a certain analogy 
between the two results is discerned. Proceeding as in [2] we come to the same 
results as to the respective degree of approximation to the true minimizing values 
of the points satisfying (19) and the thereof resulting sum (17). We see that 
when f(z.) = Oorz = — ~~ and/orf(z,) = Oorz, = + © we may substitute 


(20) 8 f(a) + (a4 —_ ny = | and/or = 8 f(tn-1)* (un — 2n-1)° 


for those equations of (19) with h = 1 and/or h = n, also that K, varies with 
n about as n’, and that an iterative method of finding K, may be employed. 
Finally we note that the methods of the last section of [2] may be used even in 





the present case, if we put (a, — wa) = An and (uray: — Xn.) = Bra, which 
results in 
1+ OB, cs _ OAy — f(t) - (un — 2a) 
OXp—~1 OXp-1 P, : 
_ 9An _ OB _ (xn) - (tn — wr) 
OX, OX; P, , 
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ON THE MOMENTS OF THE TRACE OF A MATRIX AND 
APPROXIMATIONS TO ITS DISTRIBUTION 


By K. C. 8. Pinuar anp Tito A. MiJAreEs? 


The Statistical Center, University of the Philippines 


1, Summary. The first four moments of the sum of s non-null latent roots of a 
matrix occurring in multivariate analysis are studied. In particular, the first 
four moments of the sum of six roots are found, and are used to compare the 
upper percentage points obtained directly from the moment ratios with those 
from Pillai’s approximate distribution. 


2. Introduction. Distribution problems in multivariate analysis are often re- 
lated to the distribution of the latent roots of a matrix derived from sample 
observations taken from multivariate normal populations. The form of the joint 
distribution of s non-nuil latent roots of a matrix in multivariate analysis as 
given by Roy [10], Hsu [3], and Fisher [2] is expressible as 


f(0:, «+*, 0.) = C(s, m,n) [] o7(1 — 6,)" [I] (0; — 4), 
id Ad 


t>j 


2.1) 
0<45368---8 


- 


where 


r il : (= + 2n + 8 +1 +*) 
(2.2) C(s,m,n) = —— eg pring — 

Il (ESE: t)r (245+ ‘) r (5) 

i=1 2 2 2 
and m and n are defined differently for various situations described in [7]. 

Pillai [6], [8] has studied the distribution of the proposed test criterion, V“’ = 
>i-1 6;, deriving the first three moments and obtaining the fourth moment 
for s = 2, 3 and 4. He has also suggested [6], [8] an incomplete Beta distribu- 
tion approximation to the distribution of V“, and tabulated approximate per- 
centage points for various values of the supplementary parameters m and n 


[9]. In this paper, the work of Pillai has been extended to generalize the fourth 
moment. 


3. The moment generating function of the sum of the roots. The joint dis- 
tribution given in (2.1) can be thrown into a determinantal form of the Vander- 
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monde type and the moment generating function for the sum of s non-null roots, 
V™, can be expressed in the determinantal form [6], [8] 
E(e’’) 
| 1 1 
[ "(1 — 0,)"e** dé, «++ [ 6;(1 — 0,)"e* dé, 
0 0 
= C(s, m, n) [| #e#e ® ee eee eee ee ee ee Oe ee eae ‘* 


Os * | 
f ore" *(1 — 0,)"e"* dO, --- | Or(1 — 6:)"e"* d6y| 
0 0 


We may denote the pseudo-determinant (P.D.) [6], [8] in (3.1) by 
U(m+s—1,m+s — 2,---,m;n;t) 
and more conveniently, when ¢ = 0, by U(s — 1, s — 2,---, 1, 0). 
Differentiating (3.1) successively [1], [4] with respect to ¢ and setting ¢ = 0 
after each differentiation, we obtain 
(3.2) E(V) =i = C(s, m, n)U(s, s — 2, 8 — 3,---, 1, 0); 
(3.3) E{(V)*] = us = C(s, m, n)[(U(s + 1, 8 — 2,8 — 3,---, 1,0) 
+ U(s,s — 1,8 — 3, ---,1,0)]; 
(3.4) E[(V)*] = us = C(s, m, n)[(U(s + 2,8 — 2,8 — 3, ---, 1,0) 
+2 U(s+1,s— 1,8 — 3,---,1,0) 
+ U(s,s — 1,8 — 2,8 — 4,---,1,0)]; 
C(s, m, n)(U(s + 3, s — 2,8 — 3,---, 1,0) 
+ 3 U(s + 2,8 — 1,8 — 3,---,1,0) 
+ 2U(s+1,8,s — 3,8 —4,---,1,0) 
+3 U(s+1,s—1,s —2,s —4,---,1,0) 
+ U(s,s — 1,8 — 2,8 — 3,8 — 5,---,1,0)}. 


The method can be extended for any number of differentiations. It may be noted 
° / ° ° 

that the first relation (3.2) for wu: contains only one P.D., since the other five 

vanish because two columns in each are equal. 


(3.5) El(V™)) = wa 


4. Values of the pseudo-determinants. In this section we give the values of 
the P.D.’s involved in the expressions for the first four raw moments following 
(3.2)-(3.5). These were evaluated using a reduction formula by Pillai [8]. For 
details the reader is referred to [4]. 

Let us set (2g + a)(2g + b) --- = G(a,b,---) and (m+n) = p. Then 
the P.D. for the first raw moment is 


i : sM(s + 1) 
C Reg eG ose as ee 
(4.1) (s, m, n)[U(s, s — 2,8 — 3, , 1, 0)] Plas + 3) 
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For the second raw moment, we find 
C(s, m, n)[U(s, s — 1,8 — 3,8 — 4,---,1,0)] 
(4.2) _ 8(s — 1) M(s, s + 1) 


2! P(2s + l, 2s + 2)’ 
C(s, m,n)[(U(s + 1,8 — 2,8 — 3,8 — 4, ---,1,0)] 


(43) (8-1) M(ss+1)_ 
Qt P(e + 1, 2s + 4) 


For the third raw moment, we find 
C(s, m, n)[U(s, s — 1,8 — 2,8 — 4,---,1,0)] 
(4.4) _ a(s — 1)(8 — 2) M(s — 1, 8,8 + 1) 
t 3! P(2s, 2s + 1, 2s + 2)’ 
C(s, m, n)[U(s + 1,8 — 1,8 — 3, ---,1,0)] 
_ (8 — 1) __ M(ss+1) 
, ~  3t P(2s, 2s + 1, 28 + 2, 2s + 4) 
-{4n[(2s — 1)m + (s* + 2)] + 4(2s — 1)m? + 128’m + (45° + 28” + 28 + 4)} 


+e <5 C(s, m, n)[U(s, 8 — le-d3,e-—4, -++,1,0)], 


C(s, m, n)[U(s + 2,8 — 2,8 — 3, ---,1,0)] 
_ (s + 1)a(s — 1) M(s, s + 1,8 + 2) 


3! P(2s + 1, 2s + 2, 2s + 6) 


sM(s + 1, 2s + 2) 
P(2s + 2, 2s + 4) ° 


- C(s, m, n)[U(s + 1, s — 2,8 — 3, ---,1,0)] 


And for the fourth raw moment, we find 
C(s, m, n)[U(s, s — 1,8 — 2,8 — 3,8 — 5,---,1,0)] 
_ s(s — 1)(s — 2)(s — 3) M(s — 2, s—1,3s,s+1) 


4! P(2s — 1, 2s, 2s + 1, 2s + 2)’ 
C(s, m,n)[U(s + 1,8 — 1,s — 2,8 — 4,8 — 5, ---,1,0)] 


_ 8(s — 1)(s — 2) M(s — 1, 8,8 + 1) 
~  Sf2! P(2s — 1, 2s, 2s + 1, 2s + 2, 2s + 4) 


- {n[2(3s — 1)m + (38" + 8 + 10)] + 2(38 — 1)m’ 
+ (98° — s + 2)m + (38° + s* + 28 + 4)} 


m+s+l 
totet2 





C(s, m, n)[U(s,s — 1,8 — 2,8 — 4, ---,1,0)], 
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C(s, m, n)[U(s + 1, 8,8 — 3,8 — 4, ---,1,0)] 
s(s — 1) M(s, s + 1) 


- {n®[16s(s + 1)m? + 8s(2s° + 5s + 9)m 
(4.9) + 4(s* + 48° + 11s’ + 8s + 12)] 
+ n[32s(s + 1)m* + 8s(8s” + 15s + 13)m? 
+ 48(10s° + 30s’ + 45s + 19)m 
+ 2(4s° + 17s‘ + 36s° + 31s’ + 8s + 12)] 
+ s(s + 1)M(s + 1, s + 2, 2s, 2s + 1)}, 
C(s, m,n)(U(s + 2,8 — 1,8 — 3,8 — 4,---,1,0)] 


_ (s + 1)s8(s — 1) Mle + 1, s + 2) eek & 
a2 3!2! P(2s,2s + 1, 2s + 2,28 + 3, 2s + 6) 
(4.10) -{n[2(3s — 2)m + (38° — s + 10)] + 2(38 — 2)m’ 
+ (9s* + s — 2)m + (38° + 28° + s + 6)} 
m+s+2 , 7 % ad Me iho 
+ aes C(s, m, n)[U(s + 1,8 — 1,8 — 3, , 1, 0)], 


C(s, m, n)[U(s + 3, s — 2,8 — 3, ---,1,0)] 


_ (8s + 2)(s+ 1)s(s — 1) M(s, s+1,s+ 2,8+ 3) 

32) 4! P(2s + 1, 2s + 2, 2s + 3, 2s + 8) 

m+s+3 . 

sacar -+ O(s, m, n)[U(s + 2,8 — 2, ---,1,0)]. 

oe I + ) 

It may be noted that simplifications of relations (4.1) to (4.6) to obtain the 

first, second and third central moments yield exactly the results given by Pillai 

[6], [8]. The fourth central moment has not been obtained in general from (4.1) 

to (4.11); however, for particular cases, like that for s = 6 in the next section, 

these relations have been used to arrive at the expressions for 6; = u3/u2 and 
8. = ps/ps, Where y’s denote central moments [4]. 


5. Percentage points of V“ using moment ratios and a Pearson curve ap- 
proximation, and using Pillai’s approximate beta distribution. Putting s = 6 in 
relations (4.1) to (4.11) the following expression: for the #’s are obtained: 
(5.1) a, — Ain — m)"(p + 1)"(p + 8)(2p + 13) _ 

' — 3(2m + 7)(2n + 7)(p + 4)(p + 6)*(p + 9)?’ 

Be (p + 8)(2p + 13) 








* (2m + 7)(2n + 7)(p + 4)*(p + 6)(p + 9)(p + 10) (2p + 11) (2p + 15) 
-[4mn(6p> + 179p* + 2177p* + 13,176p" + 38,732p + 43,760) 

+ (92p° + 2996p" + 40,869p* + 294,677p" 

+ 1,169,444p" + 2,408,532p + 2,010,960)]. 


(5.2) 
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Table 5A shows the values of 8; and #2 for several values of m and n. The 
upper 5% points of V given in Table 5B for each given m and n were obtained 
by interpolating in Table 42 of [5], “Percentage points of Pearson Curves for 
given 8,, 62, expressed in standardized measure.” 

The percentage points from Pillai’s approximate distribution [9] were ob- 
tained by referring to Snedecor’s F tables using the transformation 


_(m+s+1)  V® 


with f; = s(2m + s+ 1) and fz = s(2n + s + 1) degrees of freedom. 


TABLE 5A 


Values of {i of the exact distribution for s = 6 
2 








»” 


0.03919 0.00000 0.00990 
3.00925 2.97815 2.97594 2.99674 


0.10177 0 1615 0.00990 0.00000 
3.11486 ‘4.01774 2.99674 2.98732 


0.13690 
3.18341 


0.04460 0.02192 0.00260 
3.05136 3.01996 2.99453 


0.15553 
3.21669 


0.05814 0.03013 0.00636 
3.07233 3.03597 


| 
| 
| 
| 


TABLE 5B 


Upper 5% points of V using (1) the 6’s of the exact distribution and a 
Pearson curve approximation, and (2) Pillai’s approximation 
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It may be seen from Table 5B that the percentage points computed by the 
two methods practically agree in the first two places even for small values of m 
and n. Further, we may also study the difference in probabilities corresponding 
to the percentage points from the two approximations. This may be done by 
considering the percentage points for approximation (1) in Table 5B and evalu- 
ating the probability in each case from Pillai’s beta distribution approximation. 
For example, taking m = 0, n = 10 and percentage point 1.601, exact integra- 
tion of the incomplete beta function gives the probability 0.9298 as against 
0.95. However, for m = 0 and n = 60, with percentage point 0.427, the prob- 
ability obtained by the same procedure is 0.9462. Hence, for larger values of n, 
as in the case of percentage points, the difference in probabilities also tends to be 
small. 
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by the Statistical Center, University of the Philippines, for preparing this paper. 
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RANDOM GRAPHS 


By E. N. GImLBert 


Bell Telephone Laboratories, Inc., Murray Hill, New Jersey 


1. Introduction. Let N points, numbered 1, 2, ---, N, be given. There are 
N(N — 1)/2 lines which can be drawn joining pairs of these points. Choosing a 
subset of these lines to draw, one obtains a graph; there are 2"~”” possible 
graphs in total. Pick one of these graphs by the following random process. For 
all pairs of points make random choices, independent of each other, whether or 
not to join the points of the pair by a line. Let the common probability of join- 
ing be p. Equivalently, one may erase lines, with common probability ¢q = 1 — p 
from the complete graph. 

In the random graph so constructed one says that point i is connected to point j 
if some of the lines of the graph form a path from i to j. If 7 is connected to j 
for every pair i, 7, then the graph is said to be connected. The probability Py 
that the graph is connected, and also the probability Ry that two specific points, 
say 1 and 2, are connected, will both be found. 

As an application, imagine the N points to be N telephone central offices and 
suppose that each pair of offices has the same probability p that there is an idle 
direct line between them. Suppose further that a new call between two offices 
can be routed via other offices if necessary. Then Ry is the probability that 
there is some way of routing a new call from office 1 to office 2 and Py is the 
probability that each office can call every other office. 

Exact expressions for Py and Ry are given in Section 2. These results are 
unwieldy for large N. Bounds on Py and Ry derived in Section 3 show that 


(1) Py~1—Nq™ 
and 
(2) Ry ~ 1 — 2q"" 


asymptotically as N — o. 

Other related results appear in a paper by Austin, Fagen, Penney, and Riordan 
[1]. These authors use a different random process to pick a graph and they find 
a generating function for the distribution of the number of connected pieces in 
the random graph. 


2. Exact results. Py may be expressed in terms of the number Cx, ,, of connected 
graphs having N labeled points and L lines. Since each such graph has proba- 


bility p’q “**-”” of being the chosen graph, it follows that 


L_ —L+N(N—1)/2 
Py = Zz Cy,1up : 
L 
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In this formula the range of summation is N — 1 S L Ss N(N — 1)/2. In [3] 
and [4] a generating series for Cy,, was given in the form 


w NOL iu oe i(é—1)/ 
» Cwyt eT om log (1 + a 2a +9) "") . 


pm | a! 


This result is easily converted into a generating series for Py , viz., 


© N —N(N—1)/2 o & —i(i—1)/ 
(3) oh, ee T= hg (1+ E22). 
N! imi t! 

It may be noted that, when 0 S g < 1 and x = 0, neither series in (3) con- 
verges. The equality in (3) merely signifies that Py may be found by formally 
expanding the logarithm into a power series and collecting coefficients of 2”. 
One can perform the expansion analytically to obtain an explicit formula 


(—1)""(n ri Dv re 
So: ioe 


Py = a t salina 
Tiss tN r! cas ry i(1})” gg (N1)"¥ 





The sum is extended over all non-negative integer solutions of r; + 2rz + --- + 
Nry = N (i.e. over all partitions of N). The letter n in the sum is n = 7, + 
eee + Tr. 

The first few instances of this formula are 


P, == 1 
P, =l1- q 
P, = 1 — 3q + 2¢° 


Py = 1 — 4q° — 3q' + 12g — 64° 

Ps = 1 — 5q* — 10q° + 20g’ + 30q° — 60q° + 249” 

Py = 1 — 6q° — 15q° + 209° + 120g" — 909 — 2709" + 3609"* — 120g". 

For larger values of N the number of terms in the formula for Py increases 
rapidly. Py may then be computed more easily by means of the recurrence rela- 
tion 


n—1 ~ ; 
(4) 1 —- Py = Z (7 mn ) Pg" . 


K=1 


The kth term of (4) is the probability that point 1 is connected to exactly k — 1 
of the N — 1 other points. Then (4) follows by noting that point 1 is connected 
to 0, 1, ---, or N — 1 other points with probability 1. 

The argument which was used to derive (4) may be modified to give the 
following formula for Ry : 


n—1 “om pte 
(5) 1-Rv= >> E ¥ ') ‘_— 


k==l 





RANDOM GRAPHS 


TABLE 1 





A | Po ) 7 

| 
.78400 . 50000 . 21600 
.99581 .89249 59375 .21865 
.99949 95751 .71094 . 25626 
.99994 .98497 .81569 .31690 
.90000 .70000 . 50000 . 30000 
.98100 .84700 . 75000 .36300 
. 99980 .98143 . 85353 . 52528 
9999980 .99850 . 96302 70634 











The kth term of (5) is the probability that point 1 is connected to exactly k 
of the N — 2 points 3, --- , N. Then the sum is the probability that points 1 
and 2 are not connected. 

Using these results, R. W. Hamming and the author computed numerical 
values of Py and Ry which appear in Table 1. 


3. Bounds. The formulas of Section 2 solve the problem for small N only. In 
this section we estimate Py and Ry for large N. As N increases, the number of 
paths by which points 1 and 2 may be joined increases. Then it is not surprising 
that Ry — 1 as N — for every fixed p > 0. That Py — 1 too is less obvious 
since increasing N also increases the number of pairs of points to be connected. 
Indeed, Table 1 shows Py decreasing for N S 6 when g = .9. The more pre- 
cise results (1) and (2) follow from the bounds which we now derive. 

THEOREM I: 


N z 1 | Ng < 1 as Py 


and 


1 aa Px < q’ (1 + gd tne ioe ede + ta + out 


THEOREM 2: 
. oc Pe 1—Rys2 gt & fr 


The lower bound in Theorem 2 is just the probability that at least one of the 
two points 1, 2 is connected to no other point. 

A similar idea is used in Theorem 1. A lower bound on 1 — Py is the proba- 
bility T that at least one of the points 1, 2, ---, N is connected to no other 
point. Let FE; denote the event that point 7 is connected to no other point; then 
T is the union of the events Z, , ---, Ey. A lower bound on T (and hence on 
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1 — Py) is provided by an inequality of Bonferroni (see Feller [2], p. 100): 


 P(E:) — 2) P(E.E;) s T. 
7 i<j 
Since P(E;) = q*~" and Pr (E;E;) — q*~*, we obtain the lower bound stated. 

The upper bounds are obtained using (4) and (5). In both cases we bound 
P, by 1. To bound _— we use the fact that z(N — x) is a convex function 
of x. Then 





MN —h) 2 AST tisks%, 
2 2 
MN = we SOSA LISS NR ei sean -1, 


and 


k(N—k) N/2 (N—2)k/2 (N—2) (N—k) /2 
q Sq ia +4q } 


for 1 s k S N — 1. When these bounds are inserted into (4) and (5), the 
sums reduce to the expression shown in Theorems 1 and 2. 

When WN becomes large the bounds are in close agreement. It follows from 
Theorems 1 and 2 that 


Py sa 1 a Ng + O(N*¢*”), 
and 
Ry = 1 — 29°" + O(Nq””). 


Checking these approximate formulas against Py and Ry in Table 1, it appears 
likely that Nq’~ and 2q*~ will represent 1 — Py and 1 — Ry to within 3% 
when g S .3 and N 2 6. For the same degree of approximation, larger values 
of gq will require larger values of N. 
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SCALE MIXING OF SYMMETRIC DISTRIBUTIONS WITH 
ZERO MEANS' 


By E. M. L. Beate? anp C. L. MAtitows?® 
Princeton University 


C. Summary. Suppose that a distribution A is a mixture of distributions similar 
to B but with different scale parameters; or (almost equivalently) that a dis- 
tribution F is a convolution of a given distribution G with some other distribu- 
tion. We derive conditions on (i) the moments of A and F and (ii) on the deriva- 
tives of A and F;; these conditions are necessary, but are not sufficient in general. 
The conditions (ii) are appropriate when B (or G) is of Pélya type 3. 


1. Introduction. Suppose A(x) and B(x) are cumulative distribution functions 
(c.d.f.’s) on the real line, continuous on the right, and a.e. symmetric about the 
origin, so that 


(1) A(z) + A(—zx — 0) = 1 = B(z) + B(-—z — 0), —x <r< om, 


We write X, for a random variable (r.v.) having the c.d.f. A(x), and similarly 
for Xz. We shall say that A(z) is a B-mixture if there exist r.v.’s X,, Xz, 
and Y, where Y is non-negative and independent of Xz , such that 


(2) X, = X,Y 


or equivalently, if there exists a c.d.f. C(o’) on {0, ©), continuous on the right, 
such that 


(3) A(z) = [ B(2x/c) dC(o’), 0<z 


where we interpret B(x/0) as 1 for x > 0. It is clear that A(x) is discontinuous 
atz = Oif C(O) + 0. 

In a closely related situation (see Section 3 below), if F(a) and G(2) are c.d_f.’s 
on the real line (not necessarily symmetric), we shall say that F is a G-convolu- 
tion if there exist r.v.’s Xp, X¢, and Z (Z having c.d.f. H(x) and being inde- 
pendent of X,¢, not necessarily non-negative) such that 


(4) Xp = Xe + Z. 


Some general theorems concerning the existence and measurability of func- 
tions related to mixtures of distributions were proved by Robbins [3]. Teichroew 
[6] considered the case where B(x) is the unit Normal c.d.f., and C(o’) is of 
Pearson type III. 

Received September 15, 1958; revised March 10, 1959. 
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Our interest in the mixture problem arose out of some research where it was 
possible to prove that a certain procedure was optimal whenever an error dis- 
tribution was Normal with zero mean but arbitrary variance, and also whenever 
it was a mixture of such distributions. It was thus of interest to determine as 
far as possible the properties of such mixtures. In general, given A and B, we 
would like to be able to determine whether or not A can be regarded as a B-mix- 
ture; and similarly for the convolution problem. 

Hirschman and Widder [1] investigate (4) at great length, but their results are 
not in the form we desire; thus for the case where Xg is normal (with mean zero 
and variance v, say) they give two sets of necessary and sufficient conditions for 
(4) tohold. The first of these ({1] Theorem 12.2) requires a knowledge of dF (x) /dv, 
and the second ({1] Theorem 12.4) achieves the inversion of (4) by means of an 
infinite series of derivatives of F(x); the required condition is that the sum of 
this series be everywhere non-decreasing (i.e. gives a c.d.f.). This last formula 
has been much used in practice; see e.g. Smart [5]. 

We assume that the distribution A (or F) is completely known; we do not say 
anything about the statistical problems of testing whether a random sample can 
reasonably be assumed to come from some B-mixture, and if so of estimating the 
mixing distribution C. Robbins [4] considers this estimation problem. He re- 
marks that it is of considerable importance in other connections, but awaits a 
satisfactory practical solution. 

In Section 2 we derive necessary and sufficient conditions for the existence of 
some A that is a B-mixture having given moments through order 2r. In Section 3 
we examine the relation between the mixture problem and the convolution prob- 
lem. In Section 4 we obtain a necessary condition for a given A to be a B-mixture 
(or for a given F to be a G-convolution) in terms of the frequency functions 
A’(x) (or F’(x)) and their derivatives; the validity of these conditions depends 
on certain properties of the derivatives of B (or G@), related to the theory of 
Pélys types; this relation is explored in Section 5. 


2. Conditions on moments. From (2) we have immediately that if A is a B- 
mixture, then 


(5) E(XY) = E(X¥)E(Y") 


and the |.h.s. exists if and only if each of the factors on the r.h.s. is finite. Since 
Y’ is to be a r.v. on [0, ~], its moments must satisfy certain inequalities, the 
simplest of which is the obvious one F( Y*) = {E(Y’)}’. Hence we obtain neces- 
sary relations between the moments of A and B; the simplest is 


(6) ys(A)/un(A)® = wa(B)/ua(B)*. 


so that the kurtosis of a mixture is never less than the kurtosis of a single com- 
ponent. Conversely, these relations are sufficient for the existence of some dis- 
tribution A(x) that is a B-mixture, having the given moments. 

The result that a mixture of Normal distributions (with zero means) is neces- 
sarily leptokurtic (unless it reduces to a single Normal distribution) seems to be 
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widely known, though apparently unpublished. It is worth bearing in mind 
when considering the argument that practical error distributions ‘‘must” tend 
to Normality because the error is the sum of many indepe dent components. 
It is arguable that many error distributions are mixtures of distributions with a 


common mean but different variances, and can therefore be expected to be lepto- 
kurtic. 


3. The distribution of In | X, |. Another simple line of approach to the mix- 
ture problem is to consider the distribution of In | X4 |. Before we can do this 
we must consider the probability that X, = 0, since In X, is then undefined. 
Writing Ao = Pr {X, + 0} and similarly for By and Cy , we have immediately 
from (2) that 


(7) Ao = BCs. 


Also from (2), conditioned that none of the r.v.’s are zero (i.e. X, # 0), we 
have 


(8) In| X.| =In| Xs|+mY 


which is exactly (4). Thus we have transformed the mixture problem into the 
convolution problem. If we define the conditional characteristic functions of 
In | X, | and In | Xz | by 


(9) e=2f x*dA(z), galt) = 2 2 aB(e), 
0 40+ 0 40+ 
then we have from (7) and (8) 

THEOREM 1: A necessary and sufficient condition for A to be a B-mizxture is that 
Ao S Bo and ga(t)/ge(t) is the ch.fn. of some distribution on (—, @). 

In a sense this is the complete answer to the problem, but unfortunately the 
criterion is not in general easy to apply. In some circumstances a numerical 
approach based on (8) may be effective. An approach via the moments of 
In | X,| and In | Xz | (similar to that in the previous section) will yield a series 
of necessary conditions. 


4. Conditions on the frequency function. We now consider criteria based on 
derivatives of the c.d.f.’s. Let us assume that B(z) is four times differentiable 
everywhere, and that b = B’(x) > 0 for all z. It will follow that any B-mixture 
A(z) is four times differentiable everywhere except perhaps at x = 0, and that 
a(x) = A’(x) > 0 wherever this exists. 

Now A is assumed to be a mixture of distributions with zero mean and varying 
scale parameter o; so that part of the distribution A near z = 0 will consist pri- 
marily of those components with small ¢, while the part with | z | large will con- 
sist primarily of components with large ¢. We may expect to find a necessary 
condition for A to be a B-mixture based on this fact, and the simplest such 
condition seems to be the following: 

Conjecture: If one assumed that only one component contributed to a(x) for 
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a particular z, and one estimated the value of o for this component from the 
values of a(x) and a’(x), then for any A that is a B-mixture, the value of o so 
defined is a non-decreasing function of | z |. 

The value of o described is defined by the equation 


(10) (2/o)b'(2/o)/b(x/0) = xa’(x)/a(x). 


If this equation holds for more than one c, we could make the estimate unique by 
agreeing to take the smallest value satisfying (10). But we can hardly expect 
the conjecture to be true unless xb’(z)/b(2) is a strictly monotone function of z. 
This is equivalent to the condition that the distribution B is strictly of Pélya 
type 2 (monotone likelihood ratio) with respect to the parameter co, as defined 
by Karlin [2]. 

It turns out (see Section 5) that the conjecture is correct if B is also of Pélya 
type 3 with respect to o. Although it is possible to construct symmetrical dis- 
tributions that are not Pélya type 3 with respect to c, almost all the principal 
cases occurring in statistical practice—such as the Normal, double-exponential, 
Cauchy, rectangular, triangular—are of this type. 

In terms of the distribution of In | X, | and In | Xz |, the conjecture asserts 
that if F is a G-convolution, and writing f(x) = F’(x), g(x) = G’(x), then the 
value of » defined by 


(11) g(x — w)/g(z — w) = f'(x)/f(x) 


is a non-decreasing function of x. In the following, we shall work in terms of the 
convolution problem. We shall write 


(12) R(x) = g°(x)/g(x) 
so that 
(13) Ri = dR,/dz = Rz — Ri. 


THEOREM 2: If for all x, (i) g(x) > 0, (ii) dR,/dx < 0, (iii) R2(x) is a convex 
function of Ri(x), then yu, defined by (11), is a non-decreasing function of x. 
Conversely, given (i) and (ii), if u is non-decreasing for all G-convolutions, then 
(iii) must hold. 
The statement of the theorem for the mixture problem, with o defined by (10), 
is the same as this with the R’s defined as 
b’(x) 2 b”(x) 


bi(z) R(s) = 14+ 9 + 


(14) R(x) =1+ Ok B(x) B(x) * 


Proor: From (11), 
(15) R(x — p) = f'(x)/f(z) 
[ Re — m)g(x — m) dH(m) 
(16) = - $$$ $$_$__$_____- 


, 


[ ove — m) dH(m) 
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which shows that yu exists (by (ii)). Multiplying (15) by f(x) and differentiating 
with respect to x, we find 


f'(x)Ri(z — w) + f(z) Ri(a — w)(1 — du/dx) = f”(z) 
so that (using (13) and (15)) 
—f(x)Ri(x — w) du/dx = f"(x) — f(x) Ra(x — u) 


(17 
= [ (Re — m) — R,(x — u)}g(x — m) dH(m). 


Now f(z) > 0, and Ri(x — n) < 0 by (ii); further, (iii) implies that for each 
x — p there exists some number k (independent of m) such that for all m 
(18) R(x — m) — Re(x — p) 2 k{Ri(x — m) — Ri(x — p)}. 

But from (16) 


(19) [ (Rie — m) — Ri(z — u)\g(e — m) dH (m) = 0 


so that the r.h.s. of (17) is 2 0, and du/dx 2 O as required. 

Conversely, suppose (iii) is false. We shall construct a G-convolution which 
has du/dx < 0 at x = 0. By our assumption, there exist m , mz, u (with m, < 
py < me) such that 


(20) 3{Ri(—m) + Ri(—m)} = Rf —x), 

(21) 3{Ro(—m,) + Re(—m2)} < Re(—x). 

Now choose H(m) so that dH(m)/dm = 0 except at m = m, and m, with 
tle UE Ses eons aoe 

(22) dH(m;) = 7 in m) a zm} ; 4 = 1,2. 


Then by (20), (16) is satisfied (for z = 0), and by (21), the r.h.s. of (17) is 
<0, so that du/dx < 0. 

Theorem 2 provides us with a necessary condition for F to be a G-convolution 
(or for A to be a B-mixture); namely, the uw (or o) defined by (11) (or (10)) 
must be a non-decreasing function of x. Unfortunately it will not provide a suf- 
ficient condition unless R, is a linear function of R,; ; and this is impossible over 
the whole range of x. (R. can be a piecewise linear function of R, if we allow 
d'g/dx*® to be discontinuous.) However, relaxing condition (i) of the theorem, 
we can obtain distributions for which R, is a linear function of R, wherever 
g(x) > 0; two such distributions for the mixtures problem are the rectangular 
and the triangular. It is easy to verify that a necessary and sufficient condition 
for A to be a mixture of rectangular distributions is that A be unimodal; and 
that necessary and sufficient conditions for A to be a mixture of triangular dis- 
tributions are that A should have a derivative a(x) everywhere except possibly 
at zx = 0, while a,(z) exists and is non-positive and non-decreasing for all x > 0. 
If b(z) (or g(x)) > 0 for all x, then, for example, no distribution A for which 
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the estimate of ¢ is constant for all x > 2x , but takes a different value for some 
smaller value of z, can be a B-mixture. 

If B is a Normal distribution (for which the conditions of Theorem 2 are 
satisfied), the result can be utilized in the form of a log/square plot, in which 
log a(x) is plotted as a function of 2’. It is easy to see that the slope of this 
curve is inversely proportional to the estimate of ¢; so the theorem shows that 
a necessary condition for the given distribution A(z) to be a mixture of Normal 
distributions is that the log/square plot be convex. 


5. Relation to theory of Pélya types. The conditions required in Theorem 2 
can be expressed in terms of the determinants 
; | a° a n—1 
2 A. = |—— ’ 
(23) Ox* Op’ p(z, ») i,j=0 
forn = 1, 2, 3 with p(z, uw) = g(x — uw). We shall prove 
THEOREM 3: The conditions (i), (ii), (iii) of Theorem 2 are equivalent to 


4; > 0, A: > 0, A; 2 0. 


Proor: It is easy to see that the signs of these determinants are unaffected 
by a monotonic increasing transformation of either the independent variable x 
or the parameter y; so that a proof of the theorem for the convolution problem 
will imply the corresponding result for the mixtures problem also. In the fol- 
lowing, the argument of all the functions involved is z — uy. 

For n = 1, (23) gives g > 0, which is (i). For n = 2, (23) gives 


(24) wf a > 0, ie’ Z i 
i. wr" R, R,| 
i.e. by (13), Ri < 0, which is (ii). Now 
(25) d'Re/dRi = (Ri)*(RiR: — RiR2) 
so that condition (iii) is equivalent to 
a 
(26) R, Ri Ri| <0 
‘Re RL ORS 


By differentiation we have successively 


g =o, go” =GRi+9Ri, 9” =9"Ri + 29/Ri + GR, 
g” ad gR2 , g” g'Rz + gR , ” iaed ins g” Ro + 29'R: + gR: . 


Hence manipulating the determinant in (26) according to the scheme 


(27) 


(col 3)’ = g(col 3) + 2g’(col 2) + g”(col 1) 
(col 2)’ = g(col 2) + g’(col 1) 


(28) 
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we obtain 
g g’ g”’ 
(29) g’ g’’ g” < 0 
1g” -, vent 


which is equivalent to A; 2 0. 
Karlin [2] calls a family of distributions 


(30) P(x, w) = Bu) [ p(2, n) ax(2) 


of Pélya type m (strictly of Pélya type m) if the determinants 
(31) Dn = | p(2i, 45) |i.jm 

are = 0 (> 0) for n = 1, 2, --+ , m, for all 

(32) 2<aa << SS, Wa < pe Sees < pn. 


Karlin shows that Pélya m implies A, 2 0 (n = 1,---, m), while A, > 0 
(n = 1, +--+ , m) implies strict Pélya m. 

We are indebted to the referee for the following remarks. One can derive 
only A, = 0 when assuming strict Pélya m, with A, > 0 for almost all z and 
u. It is true however that if p(z, 1) = p(x — uw) (as is the case in the present 
problem), then the equivalence is correct. This last result is quite deep and is 
not published in the literature. Most strict Pélya type distributions satisfy 
A, > 0 everywhere, but there may be isolated points where equality takes place. 

Thus our conditions (i) and (ii) are equivalent to strict Pélya 2, and (iii) is 
implied by Pélya 3. 

Karlin [2] remarks that if A, = 0 (nm = 1,---, m) with strict inequality 
almost everywhere, then under a certain weak assumption the convolution of 
G(x) with a Normal distribution of arbitrarily small variance o’ will be strictly 
Pélya m, and hence (taking the limit as o” tends to zero) G will be Pélya m. In 
such cases Theorem 2 can still be applied, provided that, whenever (11) does 
not define » uniquely, » is taken as the appropriate limit as 0° tends to zero. 

The authors are grateful to the referee for his suggestions for improving the 
presentation of the paper, and for clarifying the situation in the last section. 
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MEASURABILITY OF EXTENSIONS OF CONTINUOUS RANDOM 
TRANSFORMS 


By Orro Hans 


Czechoslovak Academy of Sciences, Prague 


1. Summary. Extension theorems of Tietze and Hahn-Banach play an im- 
portant part in functional analysis. It seems reasonable to deal with similar 
questions for random transforms. In the present paper some measurability prob- 
lems arising in connection with this probabilistic generalization are solved. 


2. Introduction. First of all we shall introduce some convenient notions, defini- 
tions of which follow those given in [1]. 

Let (Q, S) and (Z, 3) be two measurable spaces and U a mapping of the 
space 2 into the space Z so that the inclusion 


{{w: U(w) e BI: Be 3} CS 


holds. Then the mapping U will be called a generalized random variable, or, 
more precisely, a generalized random variable with values in the space (Z, 3). 
If (2, S) and (Z, 3) are two measurable spaces, X an arbitrary non-empty 
set and T a mapping of the Cartesian product 2 X X into the space Z satisfying 
the condition 
{{w:T(w, 2) e BJ:xae X, Be 2} CS, 


then we shall speak about a random transform, or, more precisely, about a 
random transform of the Cartesian product 2 x X into the space (Z, 3). 

Let us remark that in case Z is a metric space, we usually choose the o-algebra 
3 as the class of all Borel subsets of the space Z. Under this additional agree- 
ment about the c-algebra 3, a number of theorems and criteria have been stated 
in [1]. For the purposes of the present paper Criterion 6 is of most importance: 

If Z is a separable Banach space then a mapping U is a generalized random 
variable if, and only if, for every bounded linear functional f from a subset A 
of the first adjoint Banach space Z*, where the subset A is total on the whole 
Banach space Z, the compound mapping f(U) is a real-valued random variable. 

Some other definitions of a generalized random variable (or of a random 
element) have been given by other authors. Thus, for instance, Mourier [2] 
defines a random element only in the case Z is a Banach space in the following 
way: a mapping U is a random element if for every bounded linear functional f 
from the first adjoint space Z* the compound mapping f(U) is a real-valued 
random variable. Though for separable Banach spaces the definition of Mourier 
and the one of ours coincide, for arbitrary Banach spaces they differ. The defini- 
tion of Mourier enables one to prove that the sum of random elements is again 
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a random element, while for generalized random variables this statement need 
not hold as shown by Nedoma in [3]. On the other hand generalized random 
variables possess the important property that each compound mapping 7(U) 
formed by means of a Borel measurable mapping 7 (of the measurable space 
(Z, 3) into another measurable space ( Y, 9)) ) and a generalized random variable 
U is a generalized random variable (with values in the measurable space (Y, 9) ). 

Bharucha-Reid [4] follows essentially the definition of Mourier, provided his 
random elements have values in Orlicz spaces. 

The conception of Kolmogoroff and Prochorow [5] is a generalization of the 
notion of a stochastic process, while Dubins [6] defines a generalized random 
variable as a homomorphism of some Boolean algebra into the measure ring 
induced by some probability space. 

Let us remark that our definition does not depend on any probability measure 
defined on the measurable space (Q, S) and this is sometimes an advantage. 


3. Probabilistic Tietze theorem. In what follows R denotes the space of all 
real numbers and ®t the o-algebra of all Borel subsets of the space R. 

THEOREM 1: Let (Q, S) be a measurable space, X a separable metric space, M 
a closed subset of the space X and V a random transform of the Cartesian product 
Q X X into the space (R, R), which is for every fixed w ¢ Q a continuous mapping 
V(w, -) of the set M into the space R, such that for every couple (w, x) €2 K M 
the relation | V(w, x) | S s(w), where s is a real-valued random variable, holds. 

Then there exists a random transform T of the Cartesian product 2 K X into 
the space (R, ®) so that 

(i) for every couple (w, x) €2 K M we have T(w, x) = V(w, x); 
(ii) for every w ¢ Q the mapping T(w, -) is a continuous function from X into R; 

(iii) for every couple (w, x) eQ2 K X we have | T(w, x) | S s(w). 

Proor: We shall essentially follow the construction in the nonprobabilistic 
version of this theorem as given by Alexandroff (see pp. 182-183 in [7]), the 
only difference being in the definition of sets A,(w) and B,(w). For the sake of 
definiteness we shall briefly describe the construction of the random transform T’. 

We set Vo(w, x) = V(w, x) for every couple (w, x) ¢2 XK M, and for every 
n = 0,1, 2, --- we use the following recursive formulae: For every w ¢ 2 we define 


A,(w) = {2:Va(w, ) < —(2"/3"*")-8(w)} 


and 


B,(w) = {2:Va(w, 2) > (2"/3"*")-8(w)}. 


Let p(x, y) and p(x, A) denote the distance from the point z to the point y or 
to the set A. Then for every couple (w, x) ¢2 K X we put (the modification in 
case A,(w) or B,(w) is empty is omitted) T,(w,2) = (2/3)"*-s(w)- 
p(x, An(w))/(p(x, An(w)) + p(x, Ba(w))) — (2"-8(w)/3"") and for every 
couple (w, z) €Q2 K M 


Vansi(w, 2) = Valw, t) — Tal, x). 
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Finally for every couple (w, x) «2 K X we define 
T(w, «) = lim >> Ti(w, 2). 
n>o k=l 


It can be easily seen that the mapping T' satisfies the three numbered require- 
ments, hence we need only prove its measurability. First we shall prove that the 
mapping p(z, A,(-)) is for every xe X and for every n = 0, 1, 2, --- a real- 
valued random variable. We have for every c 2 0 the equality 


(1) fwp(z,An(w)) >} =U Ne fw: Vi(w, y) = — (2"-8(w)/3"")}, 
k=l yeO(z,k) 

where O(x, k) is a countable set dense in the set {y:p(z, y) S ¢ + (1/k)}N M. 
Indeed, if wo belongs to the set on the left hand side of (1), then there exists a 
positive integer ky (dependent in general on wo) so that 

inf p(x,y) > ¢ + (1/ko) 

yeAn(wo) 

and hence the set A,(wo) and the set {y:p(z, y) S ¢ + (1/ko)} are disjoint. 
Therefore for this ky and for every y ¢ O(z, ko) we have 


(2) Va(wo, y) 2 —(2"-8(w)/3"*) 


and this means that w» belongs to the set on the right hand side of (1). Conversely, 
let wo belong to the set on the right hand side of (1). Then there exists such a 
positive integer ky , that for every y ¢ O(x, ko) the inequality (2) holds. Since 
the mapping V,(wo, - ) is continuous, the inequality (2) holds for every element 
from the set {y:p(zx, y) S ¢ + (1/ko)} MN M and therefore the sets A,(wo) and 
{y:p(x, y) S c + (1/ko)} are disjoint. Hence 
inf p(z,y) 2c+ (1/ko) > 
yeAg(wo) 

and wo belongs also to the set on the left hand side of (1). Thus, provided V, 
is a random transform, we have that p(x, A.(-)) isa real-valued random vari- 
able for every xe X and quite a similar consideration holds for the mapping 
p(x, B,(-)). Therefore from the measurability of the mapping V, it follows that 
both 7, and V,,4; are also random transforms. Since Vo is a random transform, 
the same holds for 7. The proof is complete. 


4. Probabilistic Hahn-Banach theorem. The next theorem forms a probabil- 
istic version of the well-known Hahn-Banach theorem for normed linear spaces. 

THEOREM 2: Let (Q, S) be a measurable space, X a separable real normed 
linear space, M a linear manifold in the space X, axa V a random transform of 
the Cartesian product 2 X M into the space (R, R), satisfying the following con- 
ditions: 

for everyweQ, ae R, Be R, xe M andyeM 


V(, ax + By) = aV(w, x) + BV(a, y); 
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for every couple (w, x) €Q2 KX M we have | V(w, x) | S s(w)- || x ||, provided the 
mapping s of the space Q into the space R is for every we Q defined by the formula 
8(w) = supseonu | V(w, x) |, where O = {z: || x |] = 1}. 

Then there exists a random transform T of the Cartesian product 2 X X into the 
space (R, R) so that 


(iv) for every couple (w, x) €Q2 KX M we have T(w, x) = V(w, 2); 

(v) for every weQ,aeR, Be R, xe X and ye X there holds T(w, ax + By) = 
aT(w, x) + BT(w, y); 

(vi) for every couple (w, x) €2 K X we have | T(w, x) | S s(w)- || x |}. 

Proor: First of all we shall describe the construction of the mapping 7’. 

Since X is separable, there exists a countable set {z:, t2,--:} C X — M 
dense in the set X — M. Let for every n = 0, 1, 2, --- the symbol M,, de- 
note the linear manifold generated by the set M U UP: {x} and let 
Xo = U,0 M,.. We set for every couple (w, 1) ¢2 X Mo, 

Vo(w, z) = V(w, x) 
and for every couple (w, x) «2 K (Xo — Mo), 
Vo(w, x) = 0. 
Then for every n = 1, 2, --- we define recursively for every couple (w, x) e2 X 
(Xo — M,) 
Valo, z) = Varl(w, x) = 0 


and for every we2,xreM,.i1andteR 


Va(w, 2 + tan) = Van(w, rz) +t sup (Vau(w, zr) — 8(w)- || z — z, I). 


zeMne1 


Further we put for every couple (w, 7) «2 K Xo 


T(w, 2) = To(w, z) = lim V,(, 2), 


and finally for every y e X — Xo which can be written in the form y = limy.« Yn, 
where y, € Xo for every n = 1, 2, --- , we set 


T(w, y) = lim To(w, Yn). 


It is well known that for every w ¢ 2 the mapping T(, - ) is a bounded linear 
functional which is an extension of the bounded linear functional V(w, -) from 
the linear manifold M to the whole space X with preservation of the norm. Thus, 
only measurability remains to be proved. However, we can write 


{w:s(w) Sc} = f} fw: | V.(w, z)] S dj, 


zeO 


where O is a countable set dense in the set 0. Since V is a random transform, the 
mapping V» is a random transform of the Cartesian product 2 X Xo into the 
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space (R, R). Further we have for every ce R 
{w: sup (Vaa(w, t) — s(w): |z — 2, ||) S ¢} 
zEMn—1 
ag n {w:Vna(w, z) — 8(w)- | t— In I s c}, 
zeMn 1 

where M,_, C M,_, is a countable set dense in the set M,_; . Thus, 7° is a ran- 
dom transform of the Cartesian product 2 X Xp» into the space (R, R) and there- 
fore the mapping T is a random transform of the Cartesian product 2 K X 
into the space (R, R) and this proves Theorem 2. 

Theorem 2 states that for separable Banach spaces a probabilistic version 
of the Hahn-Banach theorem is valid. Theorem 3 below shows that for (not 
necessarily separable) Hilbert spaces an equivalent statement is also true. 


5. Conclusion. It would be of interest to know the extent to which Theorems 
1 and 2 remain valid if the separability assumption is dropped. In this case the 
methods of proof used above obviously fail. Unfortunately, the author has not 
succeeded either in proving the non-separable versions or in constructing appro- 
priate counterexamples. To get other positive results it seems necessary to lay 
further assumptions on the space X. Thus, a statement equivalent to Theorem 2 
is true for not necessarily separable Hilbert spaces, owing to the possibility of 
defining an orthogonal complement to a given subspace. 

THEOREM 3: Theorem 2 remains valid provided X is a Hilbert space and M a 
Hilbert subspace of the space X. 

Proor. Since every element x¢X can be uniquely written in the 
form x = 2; + 2, where z,¢ M and z, 1 M, we can set for every we 2 


T(w, x) = V(a, 2) 


and Theorem 3 follows immediately from this construction. 

Finally, let us briefly sketch an application of our results. 

The well-known Banach-Mazur Theorem asserts that every separable metric 
(Banach) space M can be imbedded in an isometric (isometric and isomorphic) 
way into the space C of all continuous functions defined on the closed interval 
(0, 1). This theorem enables us sometimes to treat generalized random variables 
with values in the space (M, Yt) as generalized randem variables with values 
in the space (C, ©). This is the case in Theorem 16 .: [1], where the measur- 
ability of the set {w: US_:{V.(w)} is strongly compact} must be proved. An- 
other example is Criterion 6 in [1] (for wording see Introduction of this paper). 
In both these cases the above mentioned treatment provides a simple and 
elegant proof of the statement in question. 

Using Theorems 1 and 2 we are able to enlarge the number of problems in 
which not only generalized random variables with values in the space (M, Dt) 
are considered, but also random transforms of the Cartesian product 2 X M 
into the space (R, ®). In the present paper we shall mention only one problem 
of this kind, namely the Representation Theorem for random Schwartz distribu- 
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tions which, roughly speaking, reads: Every random Schwartz distribution can 
be represented on every compact interval with arbitrarily great probability as a 
derivative of a strictly continuous stochastic process. This theorem was proved 
by Ullrich [8] who applied our Theorem 2 with X replaced by C and M by K,, 
where K, stands for the space of rth derivatives of all continuous functions f 
defined on a closed interval [a, b] that have derivatives of all orders, the functions 
f themselves and their derivatives taking the value 0 at both ends of the interval 
a, bj. 
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A CONVOLUTIVE CLASS OF MONOTONE LIKELIHOOD 
RATIO FAMILIES 


By S. G. Guurye anp Davin L. WaLLAcE 
University of Chicago 


1. Introduction. This note stems from the following problem posed us by J. 
Loevinger.’ Let X,, --- X, and Y be real-valued random variables such that, 
conditionally on Y, the {X,} are mutually independent with 


p(Y) = Pr{X; =1|Y} =1— Pr{X,; = 0] Y} 


and p,(y) is nondecreasing in y. Let Geni Kye oieo te Kes Be E{\Y|S=rja 
nondecreasing function of r? The answer, yes, will follow from showing that 
Pr{S = r+ 1]| Y}/Pr{S = r| Y} is a nondecreasing function of Y for each r. 
Here we have a simple case of the convolution of families of distributions with 
monotone likelihood ratios (hereafter MLR) being an MLR family. It is easy 
to see that the convolution of two MLR families is not necessarily MLR. In 
Section 2, a sufficient condition on MLR families is given that their convolution 
be MLR. In Section 3, some special results are given for multidimensional 
distributions. The problem leading to this work is discussed in Section 4. 

The MLR property is identical with the Pélya type 2 property (cf. [2]). The 
definitions used here extend to Pélya type m but the extended results, except for 
Lemma 4, are not generally true for m > 2. 


2. Convolutions of MLR families. Let G be an ordered additive group, ‘et 
© be an ordered set, and let » be an invariant, o-finite measure on G. Throughout 
this section, a family f will mean a real-valued, nonnegative function on 
G X @, such that f(z, @) is measurable in x for each @ and 


0 < fof(x, 0) du(z) < «2 


Ordinarily, f is a family of probability densities relative to u for a random variable 
with range contained in G and with parameter space 0. The convolution family 
f +g of two families f and g is defined by 


f9(z,0) = [ ie ~ atidlat) dhe). 


The spaces G and © must be ordered for the definitions which follow and G 
must be at least a semigroup for convolutions to be defined in the same space. 


Received January 14, 1959; revised April 15, 1959. 

1 The consultation with Dr. Loevinger (Jewish Hospital of St. Louis) and the research 
connected with this paper were performed under a grant from the Rockefeller Foundation. 

2 For some purposes, such as in Section 3, it would be convenient to permit the integral 
to be zero for some 6. Lemma 3 and the theorems of this section clearly hold under this ex- 
tension. 
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The group requirement is only a slight restriction. G will ordinarily be the real 
line or the integers, and is taken as an ordered group primarily because it permits 
a simple unified treatment at no extra cost. 

Definition. A nonnegative function h defined on the product of two ordered 
sets X, Y, is Pélya type 2 if, for all z, xz’ ¢ X and y, y’¢ Y such that « S 2’, 
ysy, 

h(x, y)h(z’, y’) — h(a, y')A(2’, y) 2 O. 

Definition A. A family f has property A if, as a function on G X 9, it is Pélya 
type 2. 

Definition B. A family f has property B if, for each 6 ¢ ©, the function h» on 
G X G defined by 


he(x, £) = f(x 3 g, 6) 


is Pélya type 2. 
Definition C. A family f has property C if, for all x, z’, y, y’ ¢ @ such that 
ysusyande+2r =y+/y andall 6, & ¢ O such that 6s @, 


f(x, 0)f(x’, &) = fly’, O)fly, 0). 


Property A is the monotone likelihood ratio or Pélya type 2 property for the 
family f. Property B is the monotone likelihood ratio property for the location 
parameter family generated by f(-, 6) for each fixed @. Provided all quantities 
used as divisors are positive, the definitions of properties A and B can be ex- 
pressed in the more intuitive form: 

A: f(x, 6)/f(«, @) nondecreasing in z for all 6 < @, or 

A: f(x + h, @)/f(«, 0) nondecreasing in @ for all z and all h > 0, and 

B: f(x + h, @)/f(x, @) nonincreasing in z for all 6 and all h > 0. 

Note that on taking x = y and x’ = 7/ in C, one obtains A; that on taking 6 = 6’ 
in C, one obtains B. We shall now show that property C is, in fact, equivalent 
to A and B together, and that it is invariant under convolution. 

It may be helpful to note that all results and methods of this paper are un- 
affected if any f(z, @) is multiplied by any positive function of 6. Multiplication 
by a positive function of x does not destroy MLR, but does affect the convolution 
and its MLR properties. 

Lemma 1: If f has property B, then the set I;(6) = {x:f(x, 0) > O} is, for every 
6 ¢« ©, an interval of G;ie., yell, y’ el imply x eI for all x ¢ G such that 
ysrsy’. 

Proor: Suppose f(y, @)f(y’, 6) > 0 and y < y’. Toany z¢G such that 
y <= x Sy’, there corresponds an x’ ¢ G such that x + x2’ = y + 7’ and by prop- 
erty B, f(x, @)f(z’, 0) = fly, f(y’, 4) > 0. 

Thus, for each @, there is a decomposition of G into three intervals M(@), 
1(@), M’(@) such that x e M, y e I, z € M’ imply z < y < zand f(z, 0) = 
f(z, 0) = 0, f(y, 6) > 0. For all 6, Z;(@) is nonempty, though it may contain 
only one point. M(@) and M’(@) may be empty. 
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Lemma 2: If f has properties A and B, then for any 0, & ¢ © such that 6 < @, 
M(6) < M(@’) and M’(6) > M’(@’). 

Proor: For any 6, choose 2’ ¢ J(@). Then for any x e M(@), x < 2’ and using 
property A, f(z, @’) = 0. Since this holds for all y S z, then x ¢ M(@’). A similar 
proof holds for M’(@). 


Lemma 3: f has property C if and only if it has properties A and B. 

Proor. That C implies A and B is immediate. Property C is nontrivial only 
when 7/ ¢ 1(@) and y ¢ I(@). Since y < y’, we know from Lemma 2 that 
f(y, 9) > 0. Using successively A and B, 

Sy, O)f(x, 0)f(2’, &) = fly, f(x, O)f(2’, 0) = fly, fly, @f(y’, 9) 
and C follows by division. 

Lemma 4. (Schoenberg [5]). Jf f and g have property B, then f * g has property B. 

Schoenberg’s proof for the real line extends immediately to the group G. He 
proves this result in its Pélya type m form. 

THEOREM 1: If f and g have property C, then f * g has property C. 

Proor: Using Lemmas 3 and 4, it remains only to show that f * g has property 
A, ie., forz <2’ =2z2+h, 0 < @, 


As = [f * g(x, If * g(a’, 6)) — If + g(a’, OIIf * g(z, &)] = 0. 
Throughout the proof, write f(x) = f(z, 0) and f’(x) = f(z, @). 
A, = [dota — u)f'(v)g’(x’ — v) — f’(u)g’(x — u)f(v)g(x’ — v)] 


- diu(u) X w(v)) = + 12.4+ 7s 


in which J, , J, , J; are respectively the integrals over the sets, u > v,u = v, u < v. 
Interchange u and v in 7; and incorporate with J; . 


h+i1;= e {f(u)f'(v) [g(a — u)g’(2’ — v) — g(x’ — u)g’(x — v)] 
+ f'(u)f(v) [g(a — v)g’(2’ — u) — g'(a — u)g(2x’ — v))} - diu(u) X u(v)). 


For u > v, the quantity in the second brackets is nonnegative by C and its co- 
efficient f’(u)f(v) 2 f(u)f’(v). Then, 


+12 i f(u)f' (v) [g(a — u)g'(2’ — v) — g(x — u)g(2’ — v) 
+ g(x — v)g’(2’ — u) — g(x’ — u)g’ (x — v)) dlu(u) X u(v)]) 


and 


be [i Sudf"(o)lgla — w)g'(x! — v) = gx — woe! — v9] dau) x alo) 


+ / a f(u)f' w)Ig(a — v)g’(2’ — u) — g(x’ — u)g’(x — v)] dlu(u) X u(r)] 
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+ ‘ f(u)f' (v) [g(a — v)g’(2’ — u) — g(x’ — u)g’(x — »)) 


Osu-rs 
-d{u(u) X u(v)) 
= J; + Jo + Js respectively. 


In J: make the transformation u = u’ + h, v = v’ — h, suppress the primes 
on wu’, v’, and recall that h = x’ — x > O. Then 
Iitde= J (swpo) — flu + wp — by) 

u>ve 


‘(g(a — u)g'(x’ — v) — g(a — u)g(x’ — v)] diu(u) X u(v)] = 0. 


Break J; into three integrals respectively over the sets0 S u-v<hh< 
u—v S 2h, u — v = h. The third integral vanishes, and on making the transfor- 
mation u = v’ + h,v = uw’ — hin the second and suppressing the primes on w’, 
v’, we get 


Jo= J, Wilmer) = flo + Wp (u = by) 


‘(g(a — v)g'(x’ — u) — g(a’ — u)g'(x — v)) diu(u) X w(v)] = 0. 


Hence A, 2 0, and the theorem is proved. 

Remark. This result would appear to be subsumed under Theorem 3 
of Lehmann [3], taking ge(x, £) = f(x — £, @) and dde(E) = g(, 6) du(t). How- 
ever, in lines 6 to 8 of page 410 of his proof, an additional assumption is needed, 
which is not met in our case.” 

Corouuary 1: If 8 = G, f(x, 6) = f(x — 6), g(x, 0) = g(x — 0) and f 
and g have property A, then f * g has property A. (This result for location param- 
eter families is known and is just the Schoenberg result of Lemma 4.) 

Coro.uary 2: If G is the set of integers, if for eachi = 1, --- , n, 


pi(@) z=] 
fix, 0) = 41 — p(6) z=0 
0 otherwise 


and p:(@) is nondecreasing in 6, then each f; and the convolution f, * --- *f, have 
property A. 

That B is not a necessary property for the convolution of two MLR families 
to be MLR is shown by the construction below based on the following theorem, 
whose proof is a simple computation. 

THEOREM 2: If f has property A and if, for each 0, the range of x for which 
f(x, 0) > 0 is contained in (0, 1, 2), then f * f has property A. 

This result does not extend in general to nonidentical convolutions, to three- 
fold identical convolutions, or to fourpoint ranges. 

A family f which satisfies Theorem 2 but does not have property B is easily 


’ We wish to thank Professor S. Karlin for calling this fact to our attention. 








1162 S. G. GHURYE AND DAVID L. WALLACE 


constructed by taking © as the real line, a(@), b(@) as increasing functions on 
© such that 0 < a(@) < b(@) < ~, and letting f(-, @) be the distribution with 
probabilities at 0, 1 and 2 respectively given by 


c(@), a(6)c(@), a(@)b(@)e(@) 
with c(@) = [1 + a(@) + a(e)b(@)]". 


3. Some results for multivariate distributions. A family of generalized densities 
f(x, 0), where z is a vector is said to be MLR (or Pélya type 2) if it is MLR 
along each increasing curve, i.e., if for every vector function x(t) of the real- 
parameter ¢ for which the components are nondecreasing functions of t, g(t, 0) = 
fix(t), 6 is MLR in ¢ and @. (Cf. Lehmann [3], Pratt [4].) The definition can 
also be stated in the form: 


f(a, , +++, ax, 0) is MLR if, for all z; S$ a;,i = 1,---, K, 0 @, 


A 


Seay, °**, Se; O)f(ar,-°*, rr, 0’) - flti,-°-, ie. O)f(ai,--:, TE > 6’). 


We consider only the simplest problem of extending Corollary 2 to families 
of distributions on the vertices of the cube or the simplex in K dimensions. In 
two dimensions already, two MLR families on the points (0, 0), (0, 1), (1, 0) 
need not have an MLR convolution (Counterexample 1). Restricting con- 
sideration to %-fold convolutions of a single family, the n-fold convolution of an 
MLR family on the vertices of the square is MLR (Theorem 3), but even the 
two-fold convolution of an MLR family on the vertices of the three-dimensional 
cube need not be MLR (Counterexample 2). However, the n-fold convolution 
of an MLR family on the vertices of the K-dimensional simplex is MLR for 
all n and K (Theorem 2). 

Counterexample 1. The convolution of two MLR families f; and f2 on the points 
(0,0), (1,0), (0, 1) need not be MLR: Let a(@) be a positive, increasing func- 
tion of 6 and let f; place nonzero probabilities only on the three points (0, 0), 
(1, 0), (0, 1) proportional, respectively, to 1, 2, a(@). Let fo(2, 6) = 4 at each 
point. Both are MLR. Then at the two points (0, 1) and (1, 1), fi * fe has proba- 
bilities proportional, respectively, to [1 + a(@)] and [2 + a(@)], and hence 
fi * fe is not MLR. 

Counterexrample 2. The two-fold convolution of an MLR family on the vertices 
of the three-dimensional cube need not be MLR: Let f place nonzero probabilities 
only on the five points (0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 0) propor- 
tional, respectively, to 1, a(@), a(@), 2, a(@), with a(@) as above. Then f is MLR 
but f * f is not, since at the points (1, 1, 0) and (1, 1, 1) the probabilities are 
proportional, respectively, to [1 + a(6)] and 2. 

THEOREM 3: If f is an MLR family on the four points (0, 0), (1, 0), (0, 1), 
(1, 1), then, for every n, the n-fold convolution of f with itself is MLR. 

Proor. Let f(x, 6) = pi;(6@), for x = (7,7), and let q;;(@) be the value of the 
n-fold convolution of f at (7, 7), which is given by ) 


(1) 2» qi(0)t'u? = [> pis(0)t'u? |". 





MONOTONE LIKELIHOOD RATIO FAMILIES 


We are given that foralli s 7’,j = j’,@0s @, 


(2) Pij(O) pe i (0) = Dis(O ps i (8), 


and must show that for allr < r,s S 8’, 0S @, 


? 


(3) Gro( 8) gree (O ) Zz Qre( 8 ) Gre ( 8). 


From (1), it follows that, for given s, the sequence {q,.(@), r = 0, 1,---} has 
the generating function 


P(t) = (") { poo(@) + pro(O)t}”*{ po(@) + prr(O)t}" 


and, for given r, the sequence {q,.(@), s = 0, 1, --- } has the generating function 
Qutu) = (”) tre(0) + pa(O)uy pCO) + Pu(O)ul’ 


Both represent convolutions of two-point, one-dimensional families, MLR by 
(2), and hence, by Corollary 2 to Theorem 1, 


Qre(O)Qrie(O) = Qre( A) re O’) 


and 
rs( 8) dre (0) = rs’ (8) Gre( 0’). 


The desired conclusion (3) follows easily if at least one of q,.(@), Grs( 0’), Grae (9), 
drs’ (8) is positive. (3) is trivially true unless q,.( 6 )q,.(@) > 0 and is one of the 
above special cases unless r’ > rand s’ > s. If r’ < 8’, then q,-.(@) > 0 implies 
either pu(@)pun(@) > O or pyo()pu(@) > O. In either case, g,(@) > O and (3) 
follows. A similar argument holds if 1’ > s’ and also ifr’ = s’ except when 
pul) > 0, pol?) = pu(@) = 0. But then, by the MLR property, also py( 6’) = 
pa(é’) = 0 and at 6, @ the distributions are one-dimensional along the diagonal. 
Hence Corollary 2 (for the group of diagonal integers) applies directly. 

THEOREM 4: Jf f is an MLR family on the K + 1 vertices of the unit simplex 
in K dimensions, then, for every n, the n-fold convolution of f with itself is MLR. 

Proor: Let {p,(6@), 7 = 0, 1, --- K} be the values of the famaily f at the origin 
and unit points of the K axes respectively. Let g,(@) denote the value of the 
n-fold convolution at the point r = (nm, --- , 7x). We must show that for 6 < 6’, 
and r, 7’ such that r; S r; oe eS aie * 


(4) Qre(O)qr( 0") 2 gy( 8" )qr( 8). 


A generating function argument similar to that used above easily proves the 
result when r and 7’ differ in only one coordinate. But if g,,(@) > 0, then p;(@) > 0 
for all coordinates such that r; > 0, and hence q.(@) > Ofor all ssuch that 8; < rj, 
j =1,---,K. Division by q,(@) is permissible and (4) follows easily by repeated 
application of the result for changes in a single coordinate. 


4. An application. The problem mentioned in the introduction arose in the 
following way. Let X,,---, X, denote the scores made by an individual on 
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n test items with the value 1 for correct, 0 for incorrect. Let S = X; + --- + X,. 
Let Y be a real random variable representing the individual’s (unobservable) 
position on a single scale (latent continuum) assumed to determine his per- 
formance on the test according to the model: For each 7, 


f(l, Y) = pV) = Pr{X; = 1| Y} = 1 — Pr{X; = 0| VY} 


and conditionally on Y, the {X,{ are mutually independent. Let ¢ be the prob- 
ability density function for Y, representing, perhaps, the distribution of the 
ability Y over some population. If it is assumed only that each f; is a nonde- 
creasing function, what can be said about the individual value of Y conditionally 
on the sum S of the scores of the n items? The answer is that the conditional 
distribution function of Y given S = a lies to the left of that for S = 6 > a. 
Hence, the conditional mean (or median or quantile) of Y given S is a non- 
decreasing function of S. 

(The result would not be true without restrictions on the functions p, if, for 
example, the difference between a correct and an incorrect score differed from 
item to item.) 

The result is an application of Corollary 2 to Theorem 1 and of the following 
lemma. 

Lemma. If X, Y are real random variables, if Y has density o(y) relative to 
the measure v, if X given Y = y has the conditional density f(x, y) relative to the 
measure wu, and if the family f is MLR, then 


Pri Y¥ sa|X =z} > Pr{¥Y sa|X = 7} 


for alla and all x S x’. 

The lemma can be proved by relatively simple calculations and is equivalent 

to a result of Good [1]. 
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1. Introduction and summary. Let x and y be two independent normal variates 
each distributed with zero mean and a common variance; it is then well-known 
that the quotient z/y follows the Cauchy law distributed symmetrically about 
the origin. Now the question that naturally arises is whether we can obtain a 
characterization of the normal distribution by this property of the quotient. This 
converse problem can be more precisely formulated as follows: 

Let x and y be two independently and identically distributed random variables 
having a common distribution function F(x). Let the quotient w = 2z/y follow 
the Cauchy law distributed symmetrically about the origin w = 0. Then the 
question is whether F(x) is normal. 

But this converse is not true in general. The author [1] has recently con- 
structed a very simple example of a non-normal distribution where the quotient 
x/y follows the Cauchy law. Steck [7] has also given some examples of non- 
normal distributions with this property of the quotient.” 

In the present paper we shall first derive some interesting general properties 
possessed by the class of distribution laws F(x) [Section 2]. In Section 3 we deduce 
a characterization of the normal distribution under some conditions on the dis- 
tribution function F(x). Finally in Section 4 we construct an example of a non- 
normal distribution function F(x) having finite moments of all orders where the 
quotient x/y follows the Cauchy law. The method of proof is essentially based 
on the applications of Fourier transforms of distribution functions. For the proof 


of Theorem 3.1 we require somewhat deeper results in the theory of analytic 
functions. 


2. Some general properties of F(x). We shall here discuss some general proper- 
ties of the class of distribution laws F(x). We first prove a lemma which is instru- 
mental in the proofs of the subsequent results. 

Lemma 2.1. Let x and y be two independently and identically distributed proper 
random variables having a common distribution function F(x) which is continuous 
at the origin x = 0. Let the quotient w = x/y have a distribution function G(w) 
symmetric about the origin. Then F(x) is also symmetric about the origin. 

Proor. As usual we assume that each of the distribution functions F(x) and 
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G(w) is everywhere continuous to the right. Then we have the following nota- 
tions: 
F(a) 
F(a — 0) 


ll 


Prob (x S a) G(a) = Prob (w S a) 


Prob (x < a) G(a — 0) = Prob (w < a). 


ll 


We also note that the origin z = 0 must be a continuity point of F(x), as other- 
wise the quotient w assumes the indeterminate value 0/0 with a positive prob- 
ability. Now for w > 0 we have 


G(w) — G(0) 


ll 


Prob lo < z < w| 
y 


Prob (0 < x S wy; y > 0} 
+ Prob [wy < x < 0;y < 0) 


[ tewy) — FCO) ary) 
(2.1) , 


+ [ [F(0) — F(wy — 0)] dF(y) 
zt [ [F(wy) — F(0)] dF(y) 


+ | tP(—wy = 0) — F(0)] dF(—y = 0). 
Similarly we can show that for any w > 0 


G(0) — G(—w — 0) = Prob| — ws : < o| 


(22) -[  (F(0) — F(—wy — 0)] aF(y) 


+ [ (F(0) — P(wy)) aF(—y - 0). 
Since G(w) is symmetric about the origin w = 0, we have the relation 
(2.3) G(w) — G(0) = G(0) — G(—w — 0) 


holding for all w. 
Then using (2.1) and (2.2) together, we get from (2.3) the relation 


[ trewy) + F(—wy — 0) = 2F(0)) aFy) 
(2.4) 
+ I [F(wy) + F(—wy — 0) — 2F(0)] dF(—y — 0) = 0 


holding for all w > 0. 
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Substituting 
H(wy) = F(wy) + F(—wy — 0) — 2F(0) 


in (2.4), we obtain 


(2.5) [ Hwy) ay) = 0 

holding for all w > 0. Here H(y) is a function of bounded variation. We now 
use the transformation w = e* andy = e'(—*» SuS “;—« Sv ~) and 
denote 


H(y) = H(e’) = Hi(v) and H(wy) = H(e*™’) = Ay(u + v). 


Here we note that H,(v) is also a function of bounded variation. Thus (2.5) 
reduces to 


(2.6) S Hy(u + v) dH,(v) = 0 


holding for all u(— « 


IA 


u S +). From (2.6) we see easily that the relation 


(2.7) [ ed [f- H,(u + v) ati») | =0 


holds identically for all real ¢. Let 


(2.8) y(t) = i. e* dH(v) 


denote the Fourier transform of H,(v) which is a function of bounded vauation. 
Then using the theorem of Fourier transforms of convolutions of functions of 
bounded variation we get from (2.7) 


v(t)y(—t) = | v(t) ? = 0; 
that is, 
(2.9) |¥(t)| = 0 


holding identically for all real ¢, where y(t) is defined in (2.8). Finally from the 
uniqueness property of Fourier transforms of functions of bounded variation, it 
follows immediately from (2.9) that H;(v) is a constant almost everywhere. 
Hence 


(2.10) H(y) = F(y) + F(-—y — 0) — 2F(0) =c, ae. 


Next substituting y = 0 in (2.10) and noting that the origin y = 0 is a continuity 
point of F(y), we get c = 0 and thus (2.10) reduces to 


(2.11) F(y) + F(—y — 0) = 2F(0). 
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Finally we note F(—«) = 0 and F(+«) = 1 and obtain from (2.11) that 
F(0) = 3. Thus we have 

(2.12) F(y) = 1— F(-y -9), 

which completes the proof. 

LemMa 2.2. Let x and y be two independently and identically distributed random 
variables having a common distribution function F(x). Let the quotient w = x/y 
follow the Cauchy law distributed symmetrically about the origin w = 0. Then F(x) 
is absolutely continuous and has a continuous probability density function f(x) = 
F’(x) > 0. 

Proor. As a direct consequence of Lemma 2.1 it follows that F(x) is also sym- 


metric about the origin x = 0. Let Fo(x) denote the distribution function of 
| « |. Then we can verify easily that 


0 z<0 
(2.13 Fo(x) = 
) ez) BR ee for x20. 
Thus we note that in this case the distribution functions of z and w are uniquely 
determined by the distribution functions of | z | and | w| respectively. We can 
easily verify after elementary integration that the characteristic function of the 
distribution of In | w | is given by 


E(e‘""'*!) ent mwa Ed 
Tr 
cosh { = ¢ 
(5) 
Then noting that In | w| = In| 2| — In| y | we get finally the relation 
1 


(2.14) e(t)e(—t) = wan ( ) 
~ \2 


holding for all real t, where g(t) denotes the characteristic function of the dis- 
tribution of In | z |. The relation (2.14) has also been derived independently by 
Steck [7]. From (2.14) we get at once 


1 


915 | p(t) at RETR BE. 
_— cosh (5 ‘)| 
2 


and then verify easily that [*. | ¢(t)| dt < ; that is, the characteristic function 
¢(t) is absolutely integrable. Then using the well-known theorem (({2], p. 188), we 
deduce easily that the distribution function of In | z | is absolutely continuous 
and has a continuous probability density function. Thus it follows as an im- 
mediate consequence that | z | has an absolutely continuous distribution func- 
tion. Finally from the relation (2.13) we see easily that F(z) is also absolutely 
continuous and as a continuous probability density function. 


We are now ia a position to prove the following theorem. 
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‘THEOREM 2.1. Let x and y be two independently and identically distributed random 
variables having a common distribution function F(x). Let the quotient w = x/y 
follow the Cauchy law distributed symmetrically about the origin w = 0. Then F(x) 
has the following general properties: 

(1) it is symmetric about the origin x = 0; 

(2) it is absolutely continuous and has a continuous probability density function 

f(z) = F(z) > 0; 
(3) the random variable xz has an unbounded range; 
(4) the probability density function f(x) satisfies the integral equation 





91@ Co 
(2.16) [ fla)f(ua)e de = 2 
holding for all u, where cy is a constant. 

Proor. The properties (1) and (2) follow as direct consequences of Lemmas 
2.1 and 2.2. For the proof of property (3) we proceed as follows: 

Let us suppose that the random variable x has a bounded range, that is, F(x) 
is contained in a finite interval (—a, +a) of the z-axis. We introduce the polar 
transformation x = r cos @ and y = r sin @ and deduce easily that the joint prob- 
ability density function of r and @ has the form 


(2.17) r f(r cos 6)f(r sin 6). 


We now integrate (2.17) with respect to r and obtain the probability density 
function of @ as: 


fi(@) = [10 cos 6)f(r sin @)r dr for 0506S x/4 
(2.18) , 


f.(0) = [O° 1000s 6)f(r sin @)rdr for 4/4 50S 4/2 


Finally substituting cot 6 = x/y we get at once from (2.18) that if the random 
variable x has a bounded range (—a, +a) the form of the probability density 
function of w = z/y in the range (0 S w S 1) is different from that in the range 
(1 Ss w Ss ~). The contradiction thus obtained leads to the proof of (3). 

For the proof of (4) we introduce as usual the polar transformation x = r cos @ 
and y = rsin @ and integrate (2.17) with respect to r over the range (0, © ). We 
further note that 6 = arc cot x/y has a uniform distribution. Thus the equation 
for the probability density function of @ is given by 


(2.19) [ f(r cos 6)f(r sin @)r dr = cp 


where ¢ is a constant. Then substituting « = r cos 06 and u = tan @ in (2.19) we 
get (2.16). Thus the problem of determining the entire class of distribution laws 
F(z) is equivalent to that of complete enumeration of the solutions of the inte- 
gral equation (2.16). This problem is very difficult and still remains to be solved. 
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3. A characterization of the normal law. We shall now derive a characteriza- 
tion of the normal distribution under some additional conditions on the distri- 
bution function F(x). For this purpose, we give first some analytical lemmas 
which are also of independent interest. 

LemMa 3.1. Let &(z) be a decomposable characteristic function which is regular 
(analytic) in a strip —a < Imz < +a (a > O) of the complex z-plane. Let ,(z) 
be a factor of ®(z). Then the characteristic function %,(z) is also regular at least in 
the strip —a < Imz < +a. 

This lemma on the factorization of analytic characteristic functions is due to 
Raikov [5]. A proof of this lemma is presented by Loéve ([2], p. 213). 

LemMMA 3.2. Let &(z) be a decomposable regular (analytic) characteristic function 
and ,(z) a factor of &(z). Let &( —iv) exist for some v (v ¥ 0 real). Then for this 
v, &(—iv) must also exist. Further, there always exist two finite real numbers 
K > 0 and a 2 0 not depending on v such that the inequality 


(3.1) #,(—iv) < Ke*''6(—-iv) 
is satisfied. 

This lemma is also due to Raikov [5]. A proof of this lemma is presented by 
Loéve ((2], p. 214). 

Lemma 3.3. Under the same conditions as in Lemma 3.2, let z = t + iv (tandv 
both real). Then we have the inequality 


(3.2) \b,(—z) | < Ke*''(—iv). 


The proof follows at once from (3.1) and the well-known property of the positive 
definite functions 


max | (¢ + iw)| S &(iv), (t and v both real). 
—wst<+o 

LemMa 3.4. Let f(x) be a continuous non-negative function of the real variable x. 
Let the integral {¢ x°f(x) dx exist for all real v > 0. Then the integral 


I(z) = [ a f(x) dx 


as a function of the complex variable z is regular (analytic) in the upper half plane 
Imz > 0. Conversely if the function I(z) is regular in the upper half plane Imz > 0, 
then the integral [¢ x’f(x) dx exists for all real v > 0. 

Proor. We first note that /(z) is uniformly convergent in every closed domain 
of the half plane Im z > 0. Then using the well known theorems on regular func- 
tions ((6], pp. 107, 116) we derive that J(z) is regular in the half plane Im z > 0. 
The proof of the converse statement is obvious. 

From Lemma 3.4, it is also easy to see that if the integral f¢ x"f(x) dx exists 
for all v > O, then the integral {¢ x*f(x) dx, (2 complex) is regular in the lower 
half plane Im z < 0. 

LemMa 3.5. Under the same conditions as in Theorem 2.1, let the distribution law 
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F(x) have finite moments of all orders. Let ¢(t) = E(e'"'*!) denote the charac- 
teristic function of the distribution of \n | x |. Then g(z) = E(e™'"'*') as a function 
of the complex variable z is regular in the region Im z < 1. 

Proor. Since F(z) has finite moments of all orders, the integral ft 2°f(x) dx 
is convergent for all v > 0, where f(x) is the probability density function. We 
further note that f(z) is symmetric about the origin z = 0. Then applying Lemma 
3.4, we get easily that 


(3.3) g(z) = E(e*™'*!) = 2 [ x'f(x) dx 


is regular at least in the lower half plane Im z < 0. 

Next we note that the characteristic function 1/cosh [(#/2)t] can be con- 
tinued in the complex z-plane since 1/cosh [(#/2)z] is regular in the strip | Imz| < 
1. Then applying Lemma 3.1 to the relation (2.14), we deduce at once that ¢(t) 
can also be continued in the complex z-plane and further ¢(z) is also regular at 
ieast in the strip | Imz| < 1. Thus combining the two results we conclude that 
¢(z) is regular in the region Im z < 1. Similarly we see that ¢({ —z) is regular in 
the region Imz > —1. 

We are now in a position to prove the following theorem. 

THEOREM 3.1. In addition to the conditions of Theorem 2.1, if the following two 
conditions are satisfied: 

(1) F(x) has finite moments of all orders, 

(2) o(z) = E(e™'"'*!) has no zeros in its region of regularity (z complex), then 
F(x) ts normal. 

We must note in this connection that the condition (2) is essential for the 
theorem. In the next section we shall give an example to show that the theorem is 
not true if the condition (2) is not satisfied. 

Proor. We examine more closely the equation 


(3.4) shaddh nth, ea sevtatiins 


cosh (5 :) 
for complex values of z. 


For further investigation, we have to study the analytical behaviour of the 
function cosh [(2/2)z] in the complex z-plane. We note that cosh [(#/2)z] is an 
entire function of order unity having simple zeros at the points z = +7(2k + 1), 
k = 0, 1, 2, --- on the imaginary axis. Then applying the decomposition theorem 
((6], p. 299), we have the canonical representation of cosh [(2/2)z] as: 


(3.5) cosh (5 :) = I (1 + :) 


where a, = 2k + 1;k = 0, 1, 2, --- . It is also easy to verify that the condition 
doi 1/at < © is satisfied. 
From the conditions of the theorem 3.1 and lemma 3.5 it follows that the 
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characteristic function ¢(z) is regular in the region Im z < 1 and has no zeros in 
this region. We now factorize ¢(z) in the following manner: 


ic 1 1+ 2 
(3.6) g(z) = a r ( t *) 6(z). 

From the elementary properties of the Gamma function ((6], p. 313) it can be 
verified easily that T'((1 + iz)/2) is a meromorphic function which is regular 
everywhere in the region Im z < 1, real on the imaginary axis and has no zeros 
in its region of regularity. We also note that its reciprocal 1/T'((1 + iz)/2) is 
an entire function of order unity having simple zeros at the points z = i(2k + 1), 
k = 0, 1, 2, --- all located on the imaginary axis ([6], p. 415). Hence using the 
factorization theorem of Hadamard ((6], p. 332) we get 


is Vr —_— ; ( cad “) 2/ ta, 
(3.7) (3 + ) aoe IT vill a 
i 2 


where p # 0 real; a, = 2k + 1, k = 0, 1, 2, --- . Thus the function 6(z) intro- 
duced in (3.6) must also be regular at least in the region Imz < 1, real on the 
imaginary axis and without any zeros. From (3.6) we get 


(38) e(ee(—2) = 2 -r(PE#) (15 *). (2)-0(-2). 


Again it is easy to verify from (3.5) and (3.7) 


r ( + *\ e (5 *) bag ee oe 
(3.9) 2 2 it: E ) 
cosh 9 z2 


Hence using (3.8) and (3.9) we get easily from (3.4) that 
(3.10) 6(z)0(—z) = 1 


holding for complex values of z. But we note that 6(—z) is regular at least in 
the region Im z > —1 and has no zeros inits region of regularity. Hence 1/@( —z) 
is also regular at least in the region Imz > —1 and without any zeros in this 
region. Then using the relation (3.10) it follows easily that 6(z) is regular every- 
where throughout the complex plane, that is, it is an entire function. We note 
further that @(z) has no zeros in the complex plane. 

We next prove that the order of the entire function @(z) cannot exceed unity. 
We apply the inequality (3.2) to the relation (3.4) and using the expression for 
¢(z) in (3.6), we get after a little rearrangement 


alv| a\e| 


(3.11) | @(z) | cos ( ») <Kv/xr- ivciielanuniaieenia il Kvn: etait 


CS") rCS*) 


where z = ¢ + iv (t and v both real). 
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But the right hand side of (3.11) is an entire function of order unity. Hence 
from (3.11) it follows that the order of the entire function 6(z) cannot exceed 
unity. Again we have already proved that 6(z) has no zeros throughout the com- 
plex plane. Hence using the factorization theorem of Hadamard, we get @(z) = 
e*. Since 6(z) is real on the imaginary axis we get 6(z) = e’** where a is real. 
Thus we obtain from (3.6) 


1 1 = iz —taz 
(3.12) ¢( —z) = Ve T ( 9 ) é . 
Next we substitute z = iv (v > 0 real) in (3.12) and get 
wy mal wai dew tilt Ne 
(3.13) ¢( iv) = 2 [ ry(2) de = Sr ( 5 \ en, 


Since the distribution of z is symmetric about the origin, all the moments of 
odd order are equal to zero and a moment of the even order 2k is given by 


(3.14) le = [ af(x) dx = 2 [ x" f(x) dz. 


Finally substituting » = 2k (k a positive integer) in (3.13) we have 


1 oak _ (2k)! x 
3.1 = —=T(k = -—__ 
(3.15) Moe er (k + 4)e iat ? 
where ¢ = e*/+/2. 
The proof of theorem (3.1) follows at once from the fact that the moments in 
(3.15) determine uniquely the normal distribution with mean zero and variance 


Co. 


4. An example. The non-normal distribution functions constructed in [1], [7] 
have moments only up to a certain finite order. Here we give an example of a 
non-normal distribution having finite moments of all orders. We shall now con- 
struct a characteristic function ¢(z) which satisfies the basic equation (3.4), is 
regular in the region Im z < 1, but having zeros in its region of regularity so that 
the condition (2) of Theorem 3.1 is violated, We give first two lemmas. 

Lemma 4.1. Let 


(+ %)(1+2) 
(4.1) = sage pthlal Ma tp ates 
Me 5) - §) 
a Y 7 

where y = a + 18; y = a — Banda > 0, 8B > 0 both real. Then P(t) ts always 
a characteristic function whenever the relation 8 2 2+/2 a@ is satisfied. The proof 
follows from a more general result on rational characteristic functions ([4], 
p. 721). 

Lemma 4.2. Let Q(z) be an entire function of order unity having only purely 
imaginary zeros. Then its reciprocal 1/Q(z) is always a characteristic function. 
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The proof follows from the result ({3], p. 140). 
Next we define the quantities 


a = 2k+1 k=0,1,2,---N,N+1--:@ 
(42) Be = 2/2 k =0,1,2,---N (N>0) 

Ye = a + 1B: k =0,1,2,---N 

Fe = ox — iBs k= 0,1,2,---N 


and construct the function ¢(z) as: 


¢(z) = I — O+Z0+d) il 1 


(1-2). - 2) - Z)em (a - Z) em 
1a vy tay 


= P,(z)-+P2(z). 


(4.3) 


From Lemma 4.1 it follows that P,(z) is a characteristic function, while we get 
as an immediate consequence of Lemma 4.2 that P2(z) is also a characteristic 
function. Hence ¢(z) in (4.3) is a characteristic function. It is also easy to verify 
that ¢(z) is regular in the region Imz < 1 and has simple zeros at the points 
z= —ta, + & (k = 0,1, 2, --- N) inside the region where a, and 6; are defined 
in (4.2). We also see easily that ¢(z) satisfies the basic equation (3.4). Then 
we take y(z) in (4.3) as the characteristic function of the distribution of In | z | 
and verify at once that the corresponding distribution function F(z) has moments 
of all orders, but is not normal and the quotient z/y follows the Cauchy law. 
In conclusion the author wishes to express his thanks to Professor Eugene 
Lukacs for some helpful comments. 
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Columbia University, Stanford University, and Stanford University 


1. Summary. Several modifications of the Dodge CSP-1 procedure [1] are 
presented. Changes are made in the rule of action when a defective item is 
observed while on sampling. The Average Outgoing Quality Limit (AOQL) for 
these new procedures are derived without the assumption of control. These 
results are compared with the AOQL assuming control. A production process is 
said to be in statistical control if there is a constant probability p that an item 
is defective, and if the states of all the items (defective or nondefective) are 
stochastically independent. Further, the AOQL for the CSP-1 procedure using 
probability sampling (looking at every item with probability 1/k when on samp- 
ling) is derived without the assumption of control. 


2. Introduction and results. Two continuous sampling procedures are con- 
sidered. The first procedure is denoted by CSP-4? and is as follows: 

a) At the outset, inspect 100 per cent of the units consecutively as produced 
and continue such inspection until 7 units in succession are found clear of defects. 

b) When i units in succession are found clear of defects, discontinue 100 per 
cent inspection, and inspect only a fraction 1/k of the units, choosing the item 
to be observed at random from a segment of size k (this type of sampling will be 


called random sampling). 

c) If a sample unit is found defective revert immediately to 100 per cent 
inspection, eliminating from the production process the remaining (k — 1) items 
in the segment, and commencing 100 per cent inspection with the next item fol- 
lowing the eliminated segment. Continue 100 per cent inspection until again 7 
units in succession are found clear of defects, as in paragraph (a). 

d) Correct or replace with good units all defective units found. 

It is important to discuss the implications of (c). These eliminated units can 
be considered as a source of good items for (d). Furthermore, under certain 
mathematical models for the production process such as “a state of statistical 
control’ condition c is equivalent to the following: 

If a sample unit is found defective revert immediately to 100 per cent 

inspection, commencing such inspection with the segment in which the 

defective item is observed. Continue 100 per cent inspection until again 7 

units in succession are found clear of defects, as in paragraph (a). 

The second continuous sampling procedure considered will be denoted by 
CSP-5 and is the same as CSP-4 except for condition (c) which is as follows: 
Received October 20, 1958; revised June 18, 1959. 

1 This work was sponsored by the Office of Naval Research under contract N6onr-25126. 


2 CSP-2 and CSP-3 have already been used to denote other continuous sampling proce- 
dures. 
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ce’) If a sample unit is found defective screen the remaining k — 1 items in 
the segment. Upon completion of this screening, commence 100 per cent inspec- 
tion with the next item produced. Continue 100 per cent inspection unti! again 
i units in succession, not including the k — 1 screened items, are found clear of 
defects, as in paragraph (a). 

These procedures differ from the Dodge CSP-1 procedure in paragraphs (b) and 
(c). Dodge’s statements [1] analogous to (b) and (c) are as follows: 

When 7 units in succession are found clear of defects, discontinue 100% 

inspection and inspect only a fraction 1/k of the units, selecting individual 

sample units one at a time from the flow of product, in such a manner as to 
assure an unbiased sample. 

If a sample unit is found defective, revert immediately to a 100% inspec- 

tion of succeeding units and continue until again 7 units in succession are 

found clear of defects, as in paragraph (a). 

It is not immediately evident what Dodge meant by the phrase, “‘--- , select- 
ing individual sample units one at a time from the flow of product, in such a manner 
as to assure an unbiased sample.’’ However, Dodge did study properties of his 
procedure and presented equations and charts for determining the Average Out- 
going Quality Limit (AOQL) as functions of the parameters k and i, under the 
assumption that the process is in a state of statistical control. There are several 
interpretations of the sampling procedure while on partial inspection which 
lead to Dodge’s operating characteristics under the assumption of control. 
These are as follows: (1) look at every kth item. This type of sampling is denoted 
as systematic sampling and has the practical disadvantage that the particular 
item to be chosen is known in advance. (2) sample every item with probability 
1/k. This type of sampling is denoted as probability sampling and has the dis- 
advantage that the number of uninspected items is a random variable. The 
result showing the coincidence of the operating characteristic using this type of 
sampling with CSP-1 is due to Resnikoff [2]. (3) sample only a fraction 1/k of 
the units, choosing the item to be observed at random from a segment of size k 
(random sampling). If the sample unit is found defective begin 100% inspection 
with the item following the segment in which the defective item was observed, 
allowing the k — 1 uninspected items to enter into the production stream. 

The CSP-4 and CSP-5 procedures are variations of this last type of samp- 
ling, i.e., random sampling. These procedures are investigated under the assump- 
tion of the existence of a state of statistical control and the AOQL’s so obtained 
do not coincide exactly with the values given by Dodge for CSP-1. More im- 
portant, however, the CSP-4 and CSP-5 procedures are analyzed without the 
assumption of the existence of a state of statistical control. 

The problem of determining an AOQL for a Dodge type procedure without 
the assumption that the process is in a state of statistical control was first con- 
sidered by Lieberman in [3], where it was shown that the CSP-1 procedure guar- 
antees an AOQL whether or not the process is in a state of statistical control. In 
fact, for this case the AOQL equals (k — 1)/ (k + 7). This result was obtained 
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under the hypothesis of random sampling while on partial inspection. The same 
result is obtained in this paper under the hypothesis of probability sampling 
while on partial inspection. For a given k and 7, the above value of the AOQL is 
always higher than that obtained using Dodge’s equations. This is to be expected 
since the AOQL, without the assumption of control, is the least upper bound of 
the average quality level that a production process is able to achieve. This is not 
to imply that this is the average outgoing quality of a typical production 
process, but rather, that the average outgoing quality of the process can never 
exceed this AOQL value. The production process that actually achieves this 
level is one which alternates between producing all defective items during partial 
inspection and producing all non-defective items during 100% inspection. 

It is the authors’ contention that the assumption of control is not always 
justified. Whereas a production process which achieves the AOQL found by 
Lieberman seems unlikely, it should be emphasized that deviations from control 
can produce values of the average outgoing quality ranging up to the AOQL 
found by Lieberman. 

It is intuitively clear that under CSP-4 and CSP-5 a production process which 
alternates between producing all defective items during partial inspection and 
producing all non-defective items during 100% inspection, will not represent the 
least favorable case. It is shown in this paper that both of these procedures 
guarantee a non-trivial AOQL whether or not the process is in a state of statisti- 
cal control. In fact, for CSP-4 


(es + 2) —2/e, #1 


AOQL = a o HO Sree = (Gb 4193/8 
3 3 ’ a= 0 


The AOQL is actually achieved when the process alternates between producing 
VG + D/k — 

d, = ti—-k+1 P 

k/2 ; i=k-1 


i#k-1 


defective items in a block of size k during partial inspection and producing all 
non-defective items during 100 per cent inspection. Similarly, for CSP-5 


Aogn = (+2) = 2V 6 +1 


a where c; = 1/k. 


Note that the AOQL depends only on the ratio i/k, and not on the individual 
values. This AOQL is achieved when the process alternates between producing 


a = EMik+1-# 
Lo EVE SS TS 


t 


defective items in a block of size k during partial inspection and producing all 
non-defective items during 100 per cent inspection. 
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Naturally, these results are always higher than those obtained assuming con- 
trol. However, the values of d given are not so high as to be unrealistic. For ex- 
ample, if an operator knows that only 1 in k items is to be chosen at random and 
observed, he may be careless enough to produce d defective items in this block, 
whereas if he knows every item is to be looked at (100 per cent inspection) he 
will be very careful and produce all good items. Hence, the AOQL values given 
above may not be unreasonably large. 

Finally, the CSP-4 or CSP-5 procedures are used in practice because of a 
reluctance to pass a segment in which a defective item has already been observed. 
Usually, the equations for the AOQL of CSP-1 under the assumption of control 
are used to find the necessary parameters 7 and k for the CSP-4 or CSP-5 proce- 
dures since this is a ‘conservative’ approximation. However, its conservatism 
depends upon the realism of the assumption of control. It is interesting to point 
out that the CSP-5 procedure guarantees that the AOQL will never exceed 25 % 
regardless of the choice of i and k. 


3. Theorems and proof for the AOQL without the assumption of control for 
CSP-4 and CSP-5. Define 


D,. = number of defects produced in the sth block of the tth cycle, 
D,, = 0,1, ++: , k for all g, ft. 


A cycle is the period where partial inspection begins to the time a defective is 
observed. A block is a segment of k items produced while on partial inspection 
from which a single item is chosen at random for inspection. 


N, = number of blocks (of k items) sampled in the tth cycle. It is pointed out 
that the cycle terminates when a defective is found and that for the procedures 
considered the block in which the defective is drawn is not put directly into the 
production stream. However, it will still be considered as part of the tth cycle. 
Under CSP-4, the block is eliminated and under CSP-5, the block is screened 
replacing all defective items by good ones. 


X, = total number of defects being passed in the tth cycle. X; = ph x. 


6,, are zero-one random variables and indicate whether the sth item in the 100% 
inspection sequence preceding the th cycle of partial inspection are non-defective 
or defective. 


M, = number of items inspected in the 100% inspection sequence preceding the 
tth cycle of partial inspection. This is a sure function of 6,; . 

A strategy of nature is characterized by a pair of doubly infinite sequences oi 
possibly dependent random variables 


{{ Dah, {S04} } 
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Define the number L;, (j = 4, 5), as the smallest numbers with the property 
that for every process the probability is zero that 


(1) lim sup —— ——— > 1; 
— k 2) Ni — maj + 2M 
t= a 


where 


_|k—-1, j7=4 

7 { 0, j=5 
The numbers Z, and LZ; are called the AOQL for CSP-4 and CSP-5 respectively. 
It is evident that the ratio whose lim sup is taken in (1) is just the total number 
of defectives contributed to the outgoing product in the first m cycles divided 
by the total number of items contributed to the outgoing product in the m cycles. 
It is clear that in order to determine L we may confine ourselves to considera- 
tion of strategies of nature for which the number of cycles is infinite with prob- 
ability 1. Furthermore, if we choose {4,,} = {0, 0, --- , 0, --- } with probability 
1, independent of the past, we are assured that M, = 7, (t = 1, 2,--- ), with 
probability 1. Hence, any strategy of nature for which the 6,, are not of this form 
is dominated by a corresponding strategy for which they are. Similarly it is suffi- 
cient to consider the special class of strategies for which the number of defectives 
in every block on partial inspection is 21. Hence, by confining ourselves to such 
strategies we may characterize nature’s strategy by the single infinite sequence 
{D..4, where the random variables D,, take on the values 1, 2, --- , k, with prob- 

ability 1. It then follows that 


m m 


rz Xt >. X; 


° =1 : t=l 
lim sup —_——___—_—_- § lim sup —_—_——— —— ; 


(2) i > Ni — ma; + a M, ot - Ni — maj + mi 
t=1 t=1 tel 
(j 


Turorem 1:° For every strategy {D,,} of nature and for all m 


> E(X, | Dy) 
(3) serene emanvenmeteeniemnns 4 Silas) 
k >> E(N:| Di) + m(i — ay) 
tel 


3’ The authors are indebted to Professor 8. Karlin for suggesting the method of proof 
used in this theorem. 
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where 
( (e; + 2) — 2V7fe;+1 
(4) L(e;) = 4  @ ? 4 #0 (7 = 4,5) 
{ i ? Gg 
(5) ¢; = ; 
and 
(6) D, = {Diu., Du, +++}. 
Proor: We may write 
= ( 1 N; 2 
7 i= J = ° 
(7 ) X: 2d Du U ats where Uae 0, otherwise 


Hence, 


E(X,|D,) = 2 Du E( Un | Di) 


Dy Di Da 
” = Du(1 - ) + Du (1 - 2) (1 - 9) 
Du Dat Ds 


This is a geometric series that is bounded uniformly by the convergent series 


k>o1 (1 — 1/k)’. 


Similarly, 
(9) Ne=1+ D Ua, 
e=1 
so that 


E(N.|D.) =1+ (: — **) 


Du De: Di Da — Ds ide 
+6-B)G=B)+(-2)G-BYO-¥)s 


Again, this is uniformly bounded by a convergent geometric series. 
From (8) it follows that 


(10) 
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Erion E[ou(+—")+m(-B)(0-B) +] 


m Dus ¢ . ) D 
= E]( ANE Nae 0) 
| \k + @ — a) = 


/ Du 
Du ( i ) 
“ee ¢ + (i — a) Ds) ( _ Pu) 


k+ G— a) , , 


D 
Da (1 bles Be) 
k : Ds 
+ a sate (i + ( - a) 2) 


k+ (@ — aj) > 


(20-2) 


From (10) it follows that 


k > E(N.| Ds) + m(i — a3) 


tel 


‘s yle+e(a ~ Pe) 4 (1 - Pe\(1 - Pe) 4 0s + - a). 


Noting that 


Dy , Du _ Du Ds _ Du _ Du wrhiiy 
BBB+ BOBO B)e mes 


since the left hand side is just the probability of ultimately achieving a success 
when performing successive Bernoulli trials with success probabilities bounded 
away from zero, we see that expression (12) can be written as 


(12) 


k > E(N.| Ds) + mi — a) 


— s \ Diu ‘ : \ Dat _ Du 
(13) -E[(e+ (i — a;) ) + (e+ (i — a;) De) (1 9) 


+(k+ (i — a;) PV ale - Pe) 4 of. 
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Hence, 


> E(X,|D,) 


tel 





k 2» E(N,| D.) + m(i — aj) 
t= 


is merely a non-negatively weighted average of quantities of the form 


Du (1 2) 
(14) AD i, k) © —-A~—_F£, (j = 4,5;8 = 1,2,---) 


k + (i — a) ae 


and has an upper bound obtained by maximizing each of these expressions inde- 
pendently. Taking the derivative of (14) with respect to the value d,, of Ds: 
(treated as a continuous variable) we obtain 





k—2du  _ (kdu — du)(i — a;) id 
+ (i — as)du k? + (@ — aj)dy)? ” , 
(15) f"(du, i, k) = § (¢ — aj) [ (t — aj)dst] 

|] dst ° 

lk _ 2h t= Qj 


The quantity f(d,., 7, &) is clearly maximized by setting (15) equal to zero. 
Denoting the maximizing value of d,; by d; since it is independent of s and ¢ 
we obtain 


(RVG —a)/k+1-# 











: ‘ t a; 
(16) d; =| (¢ — a3) , (j = 4,5); 
| k/2, i = a 
It then follows that 
m 7 E d; 
Om E(X, | Dz) d; (1 — a) 
£0 yu Sie eit Osh gare! Lares 
kD END) + mG — a) k+(i- a) % 
ao . 
9)—92 1 ) 
(c = a: ave; + : ’ Cj + 0) 
= Cj -_ L(c;); Qj = 4, 5); 
\1/4, cj = 0) 
where c; = (i — a;)/k. 


THEOREM 2: For any strategy {D,.} of nature, for either CSP-4 or CSP-5 


m>o t=] 


(18) lim . oo Xt = - OK E(X, | D.)| = 0, 
t=1 
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with probability 1, and 


; 1 i< : 
(19) lim Lou. -1¥ wD) |= ’ 
with probability 1. 
Proor: For t = 1, 2, --- , let 
(20) Z, = X, — E(X,| D,). 
Then 


(21) E(Z,\ D.:) = E[X, — E(X.|D,.)| DJ = 0 


so that E(Z,) = 0. Furthermore, for t > s, Z; and Z, are conditionally inde- 
pendent given D, so that 


(22) E(Z,Z,) = E\E(Z.Z.\ D.)\ = E{E(Z:| Di) E(Z.| Ds)) = 9. 


Now 


E(Zt) = E(Xt) — E[E*(X.| D.)) = E(Xt) < RE(Ni) 


(23) x “1 
= PEE(N7|D) | s FD ( ~ 1) < 0, 

a=] 
since D,, 2 1 with probability 1. Now by a well known Law of Large Numbers 
for sums of orthogonal random variables ([4] Chapter IV, Theorem 5.2) equation 


(22) together with the uniform boundedness of E(Z?) shown by (23) implies 
that 


(24) lim 1 > Z, = 0, 
moo MM tal 
with probability 1, 
so that (18) is established. Letting Z. = N,— E(N; | D.), the proof of (19) is 
similar. 
THEOREM 3. For any strategy {D,.:} of nature 
(25) L; = L(c;) (j = 4, 5). 


Proor. By Theorem 1 we have 


(26) 1S H(X:|D) — Lo)|* Y BW| Do + - w) | 50, 


for all m. If for each m we let 


(27) va => x, - 2 Bx,1D), 
tel ™ tol 
and 


(28) % = ., > Ni - 2 E(N,| D:), 


™ t=l 


then by Theorem 2, limno Vn = liMmse V.. = 0 with probability 1. But from 
(26) we have 
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Vn — kL(6;)Vn_ 


(29) a —— s Le) + re , 
mo Net (i - a; ) 


k 2) Ni + m(i — ay) 


and (25) follows upon taking the lim sup». of both sides of (29). 
If we now let 
BVG@=a)/kFI-K Ly 
(30) d; = integer nearest to t— aj; : ’ (j = 4,5); 
k/2, t= a; 
then we have 
Turorem 4: If the production process alternates between producing d¢ [ds] defec- 
tive items in blocks of size k during partial inspection and all non-defective items 
during 100 per cent inspection, then for CSP-4 [CSP-5] 


DX. 


t= 





lim sup 


m~o 


k 2) Ne + m(i — a5) 


equals L,(c) [Ls(c)| (approximately, due to the discreteness of d; and dg) and hence 
the AOQL is given by L,(c) [Ls(c)]. 

Proor: This result follows immediately from (16) and Theorems 2 and 3. 

We remark that it is easily verified by differentiation that L(cs;) < lim.o 
L(¢s) = 1/4, so that the AOQL <s } for CSP-5 for any choice of i and k. We 
further remark that if defective items found when on 100 per cent inspection 
are not replaced by good items but are discarded, the previously derived results 
are still applicable, i.e., the AOQL is still given approximately by L(c;). <f, 
under the CSP-4 procedure, a unit found defective while on sampling is also 
discarded together with the remaining (k — 1) items and not replaced, the 
previously derived results are also applicable provided that a; is set equal to k. 


4. CSP-4 and CSP-5 under control. This section will be devoted to determining 
the Average Outgoing Quality (AOQ) function and the AOQL for the CSP-4 
and CSP-5 procedures under the assumption of the existence of a state of sta- 
tistical control. 

The AOQ function is defined as 

» xX. 
AOQ; = lim sup a ______ 
= RDO Ne — may + DM. 
(31) tal t= 
> X./m 
= lim sup ——_—_—_____—_—_ (j = 4, 5) 
7 ei 2 N./m — aj + D M./m 
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where 
k-1, j =4 
a. = 


0, j=5 


Under the assumption of the existence of a state of statistical control at level 


p, the law of large numbers becomes applicable so that the AOQ function can 
be expressed as 


E(X,) ’ 
(32 AQQ,) © pep = . 
) % = GE) — a; + EM) — 

It is easily verified that. 

(33) E(M,) = ~—4 

9) 1 

(34) E(N) = > 
and 

(35) E(X.) = (k — 1)q 


where gq = 1 — p. Hence, 


oH (k an 1)(q*" ps q’**) it (k ae 1)pq*™ 
(36) = 7 +g*(k—-1) 14+— Dg’ 


The maximizing value of g for a fixed i and k is given by solving for q the ex- 
pression 





(37) (k-—1) 9g? + (§4+2)q= (64+ 1). 

Denote this value by gmax-4 . The AOQL can then be written as 
(i + 2) 

38 AOQL, = 1 — Gmsx-s --——~ 

(38) Ql. q 4 G+ 1) 

or, solving for gmax-s, the expression 
(i + 1) 

39 max-4 = (1 — AOQL : 

(39) Qmax-4 ( QI.) G+ 2) 


is obtained. Substituting this expression for gmax-4 into (37) and solving for k, 
the relationship between k and 7 for a fixed AOQL is obtained, i.e., 


RS i+ 2\"" (i + 1) AOQ, 
_ ba +(5) Gta 


For fixed k and i, the expression for the AOQL for the CSP-4 procedure as- 
suming control never exceeds the AOQL which is obtained without making any 
assumptions about the behavior of the process. However, the differences are 
much smaller for this procedure than for the CSP-1 procedure. 
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Similarly, for CSP-5, the AOQ function can be written as 


_ (a = ok - 1) __ (kh - 1) pa 
(41) Oe TET ree ie 


The maximizing value of q for a fixed i and k is given by solving for q the ex- 
pression 


(42) 2(k — 1)g°"* — (kK—1)g + (64+ 2)g =it1. 
Denote this value by qmax-s . Ihe AOQL can then be written as 








: ; . 

(43) AOQLs = SE + Dgnene — (6 + 5) teane 

or, solving for qmax-s , the expression 

44) ug = CED + VEF I = HE FD) ADT 
max-5 2G + 2) 

is obtained. 


Substituting this expression for qmax-s into (42) and solving for k, the relation- 
ship between k and ¢ for a fixed AOQL is obtained, i.e., 


(45) bao + ETD = G+ Deere 

2qmax-5 — Qmax-5 
Curves of constant AOQL derived from expressions (40), (44), and (45) are 
given in Figure 1. 


5. CSP-1 without assuming control and using probability sampling. In this 
section, CSP-1 will be studied without assuming control but using a sampling 
procedure such that while on partial inspection, every item will be inspected with 
probability 1/k, or passed without inspection with probability (1—1/k). The nota- 
tion of Sections 2 and 3 will be used, but for this problem k need not be an integer 
but may be any number > 1. 

If we let Nz denote the number of items contributed to the production stream 
during the tth partial inspection cycle, then the AOQL is defined, as before, as 
the smallest number L with the property that for every strategy of nature the 
probability is zero that 


m 


2 X 


t=] 


(46) lim sup ———> 
~~ UN+ UM 
= t= 


To obtain the AOQL it is again sufficient to consider the special class of strategies 
of nature such that M, = 7 for all t, and we must investigate the quantity 


> L. 





™m 


dX 


(47) lim sup ————__, 
ig 2. Ni + mi 


t=1 





i NUMBER OF UNITS 


Fig. 1. Curves for Determining Values of k and i for A Given Value of AOQL for CSP-4 and CSP-5 under Control. 
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for such strategies. For this problem a (randomized) strategy of nature may 
be characterized by a double sequence of possibly dependent random variables 
{P.:} where 0 S P.. S 1 with probability 1 for all s, t and where P,, is interpreted 
as the probability that the sth item in the tth partial inspection cycle is defective. 
As before we restrict our attention to strategies for which an infinite number of 
partial inspection cycles will occur with probability 1. 

Let R, be the number of items passed until (and including) the first item 
inspected during the th cycle of partial inspection. Then the R,’s are independ- 
ently and identically distributed random variables with E(R,) = k and, further- 
more, N; = R; for each t. Hence, [ the Strong Law of Large Numbers 


(48) lim inf — Ly Nie lim — Ly =k, with probability 1, 


m>o moo M tml 


for any strategy {P,.} of nature. 

We now prove two theorems which enable us to characterize the behavior of 
the numerator of (47). 

THEOREM 5: For any strategy of nature {P.} 


(49) E(X.|P:) =k —1, 
with probability 1 for all t, where P, = {Pu, Por, -+*}. 
Proor: If all s, ¢ we define 


(1, if the sth item in the tth cycle contributes a defective to 
(50) Za = the output, 
0, otherwise, 


then for all t we may represent X, by 


(51) X; = >, Let « 


emt 


Furthermore, since the probability that tne sth item reached during the (‘th 
partial inspection cycle is either not inspected or inspected and found non-defec- 
tive is given by (1 — P,:/k), we have for all s, t 


(52) B(2u\B.) = (1-2) Pal (1 - 42), 
where the empty product is interpreted as 1. Hence, 
(53) E(X, | Pr) =(1- 1) & Pw (1 - i). 


We now establish the following equation for all r = 1 by induction: 


wo) Epes) -afh-14(1-2)], 


The equation clearly holds for r = 1, and if it is assumed true for r = n then for 
r = n + 1 the left hand side becomes 
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tL TC -9)] + Pool (1 - 3) 


n+1 
-+[1- 0 -%)], 
j=l 
and the proof by induction is complete. 
We now remark that if the number of partial inspection cycles occurring is to 


be infinite with probability 1, then we must have lim... P{N? > r} = 0 for 
each t, which implies that 


(56) lim P{Ni >r|P.} = lim Il (1 - i) = 0, 


(55) 


roe j=l 


with probability 1 for all strategies {P,,} under consideration. The desired result 
(49) now follows from (53), (54) and (56). 
THEorEM 6. For any strategy of nature {P.:} 


(57) E(Xi) S$ 2(k — 1)’ + (k — 1) 


for all t. 
Proor: As in Theorem 5 we have 


(58) E(Xt| Bs) = 2 DD E(Suhes| Bs) + ¥ (Zu Bo), 


and for v > w 


ase fetho [TG Slt ah 
[a.¢-2)1F-¢-9) 
~(-)rereli@-%) 


-»(-) HO-8). 


Hence noting (49) of Theorem 5 we may write (58) as 


(60) B(Xt|B,) = 2k (1-2) > Pull (1 - eS 7s + (k —1). 


vas2 e=1 wan 


(59) 





We now establish the following equation for all r = 2 by induction: 


¥ Pull (1 -*\¥ ~ 


v=2 s=1 w=l he ra Pet 


-*D- (0+ E575) H0-4)]. 


(61) 
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It is easily verified that for r = 2 both sides of (61) are equal to Py P2:/k. If 
(61) is assumed to be true for r = n then forr = n + 1 the left hand side may 
written as 


e[a- (0+ 5 (1 - 42) 











@ afi -(0-8 2) 5 Pe) (0-2) 
Lt - Ge Eee HO 9) 
-«[1- (1+ 8 Pe.) 0-4) 


which is the right hand side of (61) with r = n + 1, so that the proof by induc- 
tion is complete. 
Now (60) and (61) imply that 








2 
(63) B(xt|B) = 28 (1-2) +1) = 20-9 + (E-D, 
and the desired result (57) follows. An examination of (61) shows that if the 
P,,’s are (for example) bounded away from zero then equality holds in (57). 
We now prove the main result of this section. 
TueEoreM 7. For CSP-1 with probability sampling the AOQL is given by 


k-1 
and this value of L is achieved by (47) when nature’s strategy is to produce all 
defective items during partial sampling and all non-defective items during 100% 
sampling. 
Proor: The results of Theorems 5 and 6 together with the argument used in 
Theorem 2 imply that 


m 


(65) i> X,eb—1, with probability 1, 


mo MM tal 


for any strategy {P..} of nature. This result together with (48) implies that 





cat 
§ s ‘ 
(66) bStyti 


The fact that equality holds in (66) follows by applying the Strong Law of Large 
Numbers to the quantities 
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(67) LS WP and ok 
™ tual 


™ t=l 
for the case where nature uses the strategy described in the Theorem above. 
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SOME CONVERGENCE THEOREMS FOR STATIONARY 
STOCHASTIC PROCESSES' 


By T. Kawata 


Tokyo Institute of Technology and Princeton University 


1. Introduction. Let &(t) (— » < t < ~) be a continuous stationary process 
of the second order (in the wide sense) with mean zero; that is, 


(1.1) E&(t + u)&(t) = p(u) 
is a continuous function of u only, and 
(1.2) E&(t) = 0, —o <t< om, 


Here E means the expectation of a random variable. 
We have, then, 


(13) a(t) = [ & az), 
and 
(1.4) p(u) = : e™ dF(A), 


where F(X) is a bounded non-decreasing function such that 
F(+2) — F(—) = (0) = E| &(t)/’, 

and Z(X) is an orthogonal process such that 

(1.5) E|Z(’) — ZA)? = FOV’ — 0) — F(A — 0). 


F(u) and Z(A) are called the spectral function and the random spectral function 
of &(t) respectively. (See, e.g., Doob [5], Chapter XI). Let 


(1.6) X(t) = f(t) + &(t), —-x <t< a, 


where f(t) is a numerical valued function, and consider 
(1.7) | a(t — s)K(s,n) ds, 
2 


K(s, n) being also a numerical valued function depending on a parameter n. 

Integrals of the type (1.7) appear in many fields in the theory of probability 
and statistics. For instance, we often encounter (1.6) in the problem of smooth- 
ing data of observed values, in the problem of predicting future values of x(t), 
and in the problem of estimating the spectral density of a stationary process. 


Received January 6, 1958; revised April 22, 1959. 
1 Work done while I was a research fellow at Princeton University in 1957-58, supported 
by The Rockefeller Foundation. 
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But here we shall consider the behavior of (1.6) from the analytical point 
of view along the lines of the classical theory of the general Fourier integral, 


and we shall show convergence theorems, some of which may be already known 
implicitly. 


Next we shall consider the special case of (1.7) 
S 
(17a) J X(s)K(s) ds = J(T). 
T Jur 


If K(s) = e“*, then | J(7){? may be considered as a function similar to the 
periodogram, in which f(t) is a trigonometric polynomial and &(t) = 0. It is 
known that if F(A) is absolutely continuous and p(A) = F’(A) (the spectral 
density of &(t)), then E | J(7)|* converges to p(£) provided p(£) is continuous 
at ¢. We shall treat the convergence of | J(7')! itself. 


2. Preliminaries. Let the spectral function of the continuous (weakly) sta- 
tiouary process &(t) be F(A) as in the preceding section. Then the necessary 
and sufficient condition for the existence of 


(2.1) n(t) = | 8(t — 8) aL(a)® 
for every s is that there exists a function k(x) such that 


[ike Parte) < @ 


and 


a. i ‘| e* dL(s) — k(x) | dF(x) = 0, 


Boo 


where we assume that L(s) is a function of bounded variation in every finite 
interval. k(z) is called the Fourier-Stieltjes transform of L(s) with respect to 
F(x). In particular if K(x) ¢ L,(—, ©), then 


(2.2) [ g(t — s)K(s) ds 


exists. 
We frequently use the following lemmas which are very well known. 
Lemma 2.1: 


(i) The stochastic process (2.1) is also a stationary process in the wide sense 
and we have En(t) = 0 and 


(2.3) En(t + u)n(t) = [ | k(x) |?-e™* dF(z), 


where F(a) is the spectral function of &(t). 


2 The integral is taken here as l.im.{—3 f2 &(t — s)dL(s), where l.im. means the limit 
in the mean of order 2 and the finite integral in the definition is also defined as a Riemann- 
Stieltjes integral, the limit process being taken as |.i.m. See M. Loéve [10] or J. L. Doob [5]. 
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(ii) If we are given another process 
(24) mt) = [ &(t — 6) dda(s), 


where L,(s) is of bounded variation in every finite interval and (2.4) is assumed 
to exist, then 


(2.5) En(t + u)m(t) = r k(a)ky(a)e™* dF (a), 


ki(x) being the Fourier-Stieltjes transform of L,(s) with respect to F(x). 
Lemma 2.2: The stochastic process y(t) in Lemma 2.1 can be represented as 


(26) n(t) = [ k(x) dz(2), 


Z(x) being the random spectral function of &(t). 
If f(t) is a numerical function such that 


[ f(t — s) dL(s) 


exists for every ¢ as an absolutely convergent Riemann-Stieltjes integral, and 


X(t) = f(t) + &(t), then we define 
/ a(t — s) dL(s) = / &(t — s) dL(s) + / f(t — s) dL(s). 


3. Convergence theorems. In this section we shall consider processes of the 
type 


(3.1) Y,(t) = n| X(t — s)K(ns) ds, X(t) = f(t) + 8(t) 


and discuss the convergence (in the mean) of Y,(t) as n — ©. Similar discus- 
sions are classical when &(¢) = 0 in the theory of the Fourier integral; for ex- 
ample we have the following fact which we shall state as 

Lemma 3.1:° Suppose that 





(i) ne ented 

(ii) K(s)eL,(—2, ~) 
and 

(iii) K(s) = o(|s|~*) when|s|— 2, and K(s) is bounded. Then one has 
(3.2) lim n / f(t — s)K(ns) ds = f(t) | K(s) ds. 


3 §. Bochner [1], 8. Bochner-K. Chandresekharan [2]. More general theorems are known. 
See S. Bochner and S. Izumi [3]. 
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Appealing to this lemma we have the following theorem. 


Tueorem 3.1: Let f(t) and K(u) satisfy the conditions of Lemma 3.1. Then 
we have 


(3.3) Lim. n - X(t — s)K(ns) ds = X(t) [ K(s) ds. 


noo 


The proof of this theorem will be omitted since it is very similar to and easier 
than the one of Theorem 3.2 later. 

If we want to estimate the error between both sides of (3.3) for instance as 
o(1/n), it is necessary to prove a convergence theorem which contains an error 
term such as the following lemma: 

LemMa 3.2: Suppose that 


f(s) 
(i) i+|s| 
(ii) f(t + u) — f(t) = O(u) 

for small u, 
(iii) (1 + | s|)K(s) e L, and 
(iv) K(s) is bounded and o(|s|~*”) as|s|—> @. 
Then one has 


eL,(-—@, 2), 


(3.4) n [ f(t — s)K(s) ds = f(t) {- K(s) ds + o(1/+/n). 
Proor: Put 
I=n ic — s)K(ns) ds — f(t) [ nK (ns) ds. 


We want to prove 


(3.5) I = o(1/Vn). 


We have 
n [ [f(t — 8) — f(t)]K(ns) ds 


nf a L(t —s) — f(t)|K(ns) ds 


i\<-73 


+ n|[ fit — )K(ns) ds — ng(t) | K(ns) ds 
Je] >a/ni/2 js|>a/ni/2 
eh+ht+ds, 


say, where a is an arbitrary positive number fixed for the moment. 
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By (iii), we have 
1/2 

I, = o(n ‘oe / | sK (ns) | as) 

& Ys|>a/ni/2 

1 1 
7 (a: . 2 | uK(u) | au) ™ (4) 

asn—> o, 
By (ii), we have 
i= o(nf | sK(ns) | as) = 0 (an'” | « |K(ns) | as) 

|8| <a/ni/2 \s\<-73 

Qa a 
=> O (, F | K(u) | au) = O ($a) ° 


Lastly we have, by making use of (iv) and (i) 


o(n f [fi — s) || K(ns) | ds) 
je} >a/ni/2 

1 ds 
— be (sta apt |e = s)| (1+ —;.) 


1 fp ift-s)l o\_ 4) 
, (Fn Le (1 + [8[)*? i) ee (iis) 


Yombining (3.5), (3.6) and (3.7), we get 
lim sup /nI = O(a). 


n~2 


(3.5) 








(3.6) 


Ts 


Since a is arbitrary, we must have 


lim ~/nI = 0, 


no 


which proves the lemma. 

We shall prove 

THEOREM 3.2: If the conditions (i), (ii), (iii) and (iv) of Lemma 3.2 are satis- 
fied, and the spectral function F(x) of a continuous stationary process &(t) satisfies 


2 


(3.8) | |x| dF(2) < @, 
then 

0 « 2 1 
(3.9) E\n | X(t — s)K(ns) ds — X(t) | K(s)ds| =o (*), 


where X(t) = f(t) + &(t). 
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Proor: We have 


=n [ X(t — s)K(ns) ds — X(t)n fp K(ns) ds 
“a [ [x(t — s) — X(#)]K(ns) ds 


=n [ Ie(t — s) — 8()1K(ns) ds + nf [y(t — 9) — s(0)1K (ne) do 
=h+h, 
Since EJ, = 0, we have 
E\I\*=E|h|*+|hl’*. 
Lemma 3.2 shows | J;|* = 0(1/n). Hence it is sufficient to show that 
(3.10) E\I,|*? = o(1/n) 


We may now write 
I, = n [ &(t — s)K(ns) ds — [ &(t — s) du(s) [ K(s) ds 


where u(s) = 0 for s < 0, = 1 for s > 0. u(s) has the Fourier-Stieltjes trans- 
form identically equal to 1. Hence by Lemma 2.1 (2.3), we have 


I= [. &(t — s) a(n [ K(ng) dé — u(s) ™ K(é) at), 


E|h[ = [ in fe (e** — 1)K(ns) as} dF(z). 


Minkowski’s inequality shows 


B\ hE s(n [| K(ns)| ds (fle - 19 arin)", 


which we write as 


(n [ |K(ns) {2 [ier dF(x) + af | x8 | ar(2)\" as), 
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G being an arbitrary positive number. This does not exceed 


on G 1/2 
E [isi x(n} as( ize ar(z)) 
+2"n [ |s| [K(ns)|do(f |x| dF(2)) | 
= E [. |uK(u) | du (fitz ar(z)) 
‘ ge [ | uK(u) | du 98 |x| ar(z)) |. 


Hence we have 


lim sup nE| J, |? = O ee: | x | ar(z)) 
which proves (3.10), since G may be arbitrarily large. Thus the theorem is 
proved. 

If further conditions are imposed on f(t) and F(x), then we can go further 
and get the asymptotic expression of [*.. X(t — s) dK(s). We shall leave this 
until another oecasion. 


4. Wiener’s formula. Wiener was concerned with the formula 


lim © f(taK (at) dt = [,K@ ae ti mon [. f(t) dt, 


a~-n 


under suitable conditions on f(t) and K(t). We shall consider the similar formula 
concerning a stationary process. Let X(t) = f(t) + &(t) as in the preceding 
sections. It seems convenient first of all to state a remark. 

It is known as the law of large numbers that (1/27) [7r &(t) dt is convergent 
in mean as 7’ — © and actually 


Lim. on [80 ye dt = Z(t +0) — Z(E — 0), 


T+-2 2 


where — is any number and Z(z) is the random spectral function. This is also 
well known (Doob [5]). Hence if 


T 
lim ol f(the dt = M; 
2T Lr 


exists for some £, then 


. 1 ¥ > —igtt a 
(41) Lim. 55 [ XWe dt = Z(t + 0) — Z(g — 0) + Me 
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Now we consider 
2 
(4.2) [ 8(t)e~*aK (at) dt. 
oO 


Then it is easy to show 
THeEoreM 4.1: If K(t) eL,(—, «), then 


(43) Lim. | 8(t)e““‘aK(at) dt = [Z(— + 0) — Z(é — 0)] [ K(t) dt. 


a+0 


For putting the representation (1.3) into (4.2), assuming § = 0 without 
loss of generality, we have 


[ &(t)aK (at) dt = [ ([- e* K(t) at) dZ(x), 


and f®,, e*""k(t) dt tends to zero boundedly as a > 0 when z ¥ 0 by the Rie- 
mann-Lebesgue lemma and is {*, K(t) dt when zx = O(a # 0). Here we used 
the fact that if [*.|g.(x) — g(x)|*dF(z) — 0, then 


[. Qa(x) dZ(x) > [. g(x) dZ (2). 


Now a Wiener-type formula of 8. Bochner’s states [1]: if 


(i) K(2) is absolutely continuous in every finite interval, 
(4.4) (ii) |2°K(x)| < H, K(x) eL,(—~@, ~), H being a constant, 


T 
(4.5) (iii) oF [ | f(t) | dt < G, G@ being a constant, and 
“ L ¥ 


(iv) M = lim3 


T+-2 21 


e 
[ f(t) dt exists, 
= 


then 


(46) lim [ f(t)aK (at) dt = u [ K(t) dt. 


a+0 


This fact and Theorem 4.1 show immediately that 


(4.7) Lim. | X(t)e“aK (at) dt = [My + Z(E +0) — Z(E — of K(t) dt. 


a+0 


From (4.7) and (4.1) the following theorem follows immediately 
THEOREM 4.2: If conditions (i), (ii) and (iii) above are satisfied and 


rr ae 
op [fide dt 


exists for some &, then 


oe T oo 
(4.8) Lim. [ X(t)é ‘aK (at) dt = Lim. 7 [ X(t)e™ dt K(t) dt. 
a+0 L_— o0 T-2 ar T 


Formula (4.8) means that the both sides exist and are equal. 


I 20 





1200 T. KAWATA 


5. Periodogram. Let X(t) = f(t) + &(t) as before. We suppose in this section 
that the speciral function F(x) of &(t) is absolutely continuous and we denote 
the spectral density as p(x): 


[ p(x) dx = F(x). 


It is known and easily proved that 
(5.1) lim Ej smn Sal i &(t)e™™* dt | = p(x), 


provided p(x) is continuous at z. 
Now we suppose that 


T 
ge ( —izt 
(5.2) lim a , [se a 


exists for some x and for some 0 S a < 1. 
Then we have 


ll 
= 
~ 

S 
R 


r —trt 1 a c —irt 1 | y txt . 
[" any ] , driiment It\. 
o “p X(t)e aul { TE » ektye dt | f(e dt 


Hence we get, letting T — ~, 


lim ae B| [ X(t)e™ dt| 
(5.3) = p(x), if a> 3, 
= p(x) + M.,, if a = }, 
= 0, if OSa<} and M,, ¥ 0. 


Now we consider the mean convergence of 

1 | T , 2 
54 te f X(t)e** dt 
(5.4) ie? | > (t) | 


when 7 — «. We shall call (5.4) the periodogram of X(t), mentioning that 
this is exactly the periodogram of f(t) if f(t) is a trigonometric polynomial and 
&(t) = 0. Many authors suggest (U. Grenander [6], U. Grenander and M. 
Rosenblatt [7, 8], Z. A. Lomnicki and 8. K. Zaremba [11]) that (5.4) or 


9 
| 


| , 
—txt 
| f. &(t)é dt) 





(5.5) J(T) = 


does not converge to p(x). 


We shall discuss in the following sections the behavior of (5.5) and prove 
that J(t) does not converge in mean to any random variable when &(t) behaves 
like a stationary Gaussian process in a certain sense. In the case where &(¢) is 
stationary Gaussian process U. Grenander and M. Rosenblatt gave extensive 
discussions (e.g. U. Grenander and M. Rosenblatt [8]). 
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6. Theorems on the periodogram. We shall impose further conditions on 
&(t). We suppose hereafter that &(t) is real, E | &(t) |‘ < © for every t, and 


(6.1) E&(s)&(s + u)&(s + v)&(s + w) = Pu, v, w), 


isa function of u, v and w alone and independent of s; that is &(t) is a stationary 
process of the fourth order. Further let P(u, v, w) be a continuous function of 
u, v and w in the whole range R; . 

Put 


(6.2) P(u, v, w) = Q(u, v, w) + Pe(u, »v, w), 


where 
(6.3) Po(u,v,w) = p(u)p(v — w) + p(v)p(w — u) + p(w)p(u — v), 


p(u) being the covariance function (1.4) of &(t) as before. If &(t) is a Gaussian 
process, then Q(u, v, w) = 0. Thus Q(u, v, w) will be considered as a measure 
of non-Gaussianess and was introduced by Magness (T. A. Magness [12], see 
also E. Parzen [13]). We also assume that Q(u, v, w) is the Fourier transform 
of a function q(é’, ’, ¢’) which is integrable in R; , bounded, continuous and 
satisfies the Lipschitz condition at a point (—&, —&, &), 


(6.4) Q(u, v, w) = (1/2n)*” [ [ [ q(t’, n’, pl etal teeter") de’ dn! d¢’ 


Let &(t) have the spectral density p(x), assumed to be continuous at + = & 
and bounded. 


Under these conditions, we shall prove the following theorem. 
THEOREM 6.1: 


2 


1/7 * 
P (T wee f a ‘ » s 
I(T) = int [sw dt 


satisfies the limit relation 


- 


T'>T+2 


(6.5) lim <E|J(T) — J(T’)? — (1 _ r) 2p'(z)> = 0 


if § # 0, and 


T'>T+2 


2. : - ” Vv 2 T 2 \ 
(6.6) lim 4 B| I(T) — I(T") — (1 = ae) + 4p°(0)) = 0. 
} 


The theorem implies that E| J(7) — J(T’) |? never converges; in other 
words /(7') never converges in mean except at a point where p(é) = 0. 

As a theorem for the covariance of J(7T) we get under our assumptions above 

THEOREM 6.2: We have 


(6.7) lim {Cov (J(T), J(T’)) — (1 + =) pce)) = 0 


T'>T+2 
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if ¢ ¥ 0, and 
( 

(6.8) lim {Cov (I(T), S(T") ms ro} = 0. 
T'>T+2 T 

This follows immediately from the fact that 

(6.9) lim {es (TNT) = (: + re ce} = 0, 
T'’>T+2 

if § = 0 and 

(6.10) lim (BIT) I(T") — (+! + = 7)? (0)} = 0. 
T'>T+0 


The proofs of above theorems wil! be done in Section 10. 


7. Lemmas. It seems convenient to state lemmas in advance which will be 
used in the courses of proofs of the theorems. 
Lemma 7.1: Let p(x) e I,(— 2, ~) and be continuous. Then we have 


ye giguge? sin T(x + €) sin T(z — &) ; 
( = _ ae = = 
7.1) him a p(x) - Ta+dDa-)~ dx = p(0), if & 

= 0, if &# 0. 


The integral when ¢ = 0 is the Fejér integral and the case § = 0 is very well 
known. The case § # 0 was proved by U. Grenander [6]. Some Fourier integral 
theorems involving the integral (7.1) and having a close connection with esti- 
mation theory of the spectral density of a stationary process were discussed by 
the author recently (T. Kawata [8]). 

LemMa 7.2: Let p(x) eL,(—«, «) and be continuous. Then we have 


= sin T(x + &) sin T’(x + €) 
_— ( - / 
(7 2) at \aV/TT" [. ion ) (x 4. §)? dx 


T\ | 
— plé) V T= 0 


and 


bis sin T(2 + &) sin T’(a2 — & é a 
(7.3) ; lim ar |. p(x) G+ )G-s dx=0, if &#¥0. 


LemMa 7.3: Let 6 be any positive number and let S(é) be the domain |x| < 4, 
|\y| < 6, |z2| < 6 in R,, the three dimensional Euclidean space. Then 


[Tf sin T(x +9+ z) sin Tx sin T’y sin T’z | 
(7.4) R3—8(3) atytz x y 2 





| dx dy dz 


9 


= O(T log’ T’) 


as T and T’ tend to infinity in any way. 
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Lemma 7.4: If g(x, y, z) ts bounded and satisfies 
lo(z,y,2)|SC(|z|+lyl +z) 


for some constant C, near the origin, then 
sin sin T(x "a+y+z2) z) sin sin Tx sin sin T’y sin T’z T'z 
IL | (2, v2 _ £58 if ee 
= O(log’ T log T’ + T log’T’) 
as T and T’ tend to infinity in such a way that T’ > T. 


We should like to add a remark. Lemma 7.4 suggests that we should have a 
convergence theorem like 


; . - sin T(x + y + z) sin Tz sin Ty sin Tz 
| ( [ff ’ ’ ce ae +s d d 1 
(7.6) ies —« Ha we T(x + y + z)ryz ae 


dx dy dz 


= f(0, 0,0). 


In fact, under some conditions, (7.5) is true, and a more general theorem was 
proved by Bochner and the author [4]. 
We shall prove Lemma 7.2. Since 


a f sin T(x + &) sin T"(x + &) 
Ti r (x + €)? 


a 1 [ sin T w sin T’w 
~ er/TT"’ 


dx 


2 
Ww 


we have 
ae sin T(x + &) sin T’(z + +8). 
=f. p(x ae (x + ¢)’ p(é) a 


sin Tw a T’w 


- al. [p(w — &) — p(é)] ———— i dw 


1 l i 
= SVIT Sins | #VIT" Secs’ 


where 6 is a positive number such that for a given e > 0, 


(7.8) | p(w — &) )— pl&)|<~e for |w| < 4, 


because of the continuity and evenness of p(z). The first term of (7.7) converges 
to zero as T,T’ — ©, and the second term is less than 


ma sin Tw sin T’w | 


_«_ f° |snTosin Te) 5, 
rV/ TT’ inl w w | 


r € (- sin’Tw (C sin’ Tw | ) "i 
=rVTT'\J-. w - a a 
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We shall next prove (7.3). We can easily prove, by the Parseval relation, 
1 r sin T(x + §) sin T’(x — &) sin 2T¢ 
rV/TT" Jo 


if T’ = T, — 0. Hence the left hand side of (7.3) becomes 





(x + §)(a — €) ~ VIT% 


ees ie sin T(x + £) sin T’(x — &) sin 275 
rV/TT J twa) — ween ys—e) @ TP) ery 


Dividing the first integral into three parts as 


it dais Si : 
|jz—-E| <8 |z+t| <6 jz—|>8,|z+é|>8 


5 being chosen so that | 5| < £&, and proceeding as the proof of Lemma (7.2), 
we can prove (7.3). 


8. Proof of Lemma 7.3. We shall change the notation for simplicity. We 


write Ty) = T, T; = T, T: = T’, T3 = T’ and x, x2, x3 for z, y, z respectively. 
Denote Dy = [x; > 6, 7 = 1, 2, 3]. The integral in (7.4) is written as 


im fff... |e Gadus) eters | 
R3-8(5) | >i i=1 ai | 


dv being a volume element in R; , which we divide as 


(8.1) r=[fJf+2Off f+ Es {f, -a+e+e 


say, D; being the domain Dy — [| 2; | > 4], and D,; being 
Do — [| ai | > 8, | 2;| > 4). 


The first integral of the right hand side of (8.1) will be further divided into 
integrals of four types such as 


(a2) Hl, 

(83) ben wees 
(84) _ Re 
(85) Mo 
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where 7, j, k are distinct. We shall estimate each of the integrals successively. 
First (8.2) is not greater than 


IE petal 
2).29,23>86 Ty Le X3(X1 + e+ 23) 








diz [” diz dx, “das [“logl(S + x2 + xs)/8) 
wo ASE SL atea Pee 
, f° dx: log (x2 + 23)/6 a dz; | log x2 | + log 
aia [* ef a2(x2 + 23) dh & cfs af ~ ta(a2 + 5) 


‘ “a r+é6 
= 2C I Z | log x | log — dx 


which is finite. Here C is a constant C(é) which may differ on each occurrence. 
Considering the integral of type (8.3), we shall have, for instance 


SSferex>8,25<- 


which is 


(8.7) [ff sin T(x i+ — 23) Ty te sin 7; 2; re 
21,.%2,%3>86 | | 


ti + 2 — 23 inl 





The integrand of (8.7) does not exceed 1/(2%2%3| 21 + 22 — 23| ). If wein- 


tegrate this over 2; , 22, %3 > 6, | 71 + x2 — 23| > 6/2, then we see that it is not 
greater than 


[ dx, * dx, 1 t + t+ 6/2 
= — ———— log 
o 6) Je Xe (t+ 22) 6/2 





dx * die 1 t + 2 — 6/2 
+[% PTS 8/2 
and it can be easily shown, as in the estimation of (8.6), that this is finite. 

On the other hand the integrand of (8.7) over the domain 2, 22, 23; > 4, 
| ay + 22 — 23| < 6/2 does not exceed T(1/z;r2r3), and the integral over the 
domain can be proved to be SCT’. Hence it has been shown that (8.2) = O(T). 

The integrals of type (8.3) will be O(7'), which is also shown easily. Each 
of the integrals of type (8.4) is just the same as the corresponding integral in 
(8.3) and the integral (8.4) is the same one as (8.2). Hence we get 


(8.8) I, = O(T) 


in (8.1). 
Next we shall consider 7; in the right hand side of (8.1). For instance 


[fy -ff/ sin To Des yy sin Ts 24 
Dy |211 $4,|z2|,/23|>8 >a: a; 


t=1 


~ SIT SIL 


dv, 





(8.9) 
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Mca etecinieele 5, | 22| > 6,| 23| > 6,| 21 + a2 + 23| > §] 
and Dy» = [| 2| S 6,| 22| > 5,|23| > 6,| a1 + te + 23| < Ol. 
If we write 


Diyi=([|21| <8/2) Di (6/2<21 <8) 


then the latter is found to be bounded as in the arguments in the first step of 
the estimation of (8.7). The first integral does not exceed 











/ sin Tx hei dre / dx3 
jz1|<8/2 vy |zel>6 X2 ; |zg|>6,|2y+2q+23/>8 | a(21 + 2 + x3) | 
< af S| én, { da, dx; 
~  Stay<si2| lzgi>s | 2 | Jizgi>8.leetes|>8/2 | Za(t2 + 2s) | 


since | 2 + x2 + 23| > 6, | a, | < 6/2 implies | x2 + 2; | > 6/2. The last integra! 
is not greater than 


~ / dx2 / dis / | sin Tx; De 
— | ene | ee — ° 1 
leol>s | Xe | Jizsi>slegtes!>8/2 | (Xe + 23) | Jiesicss2| TM 





in which the first factor is O(1) as in the evaluation of (8.7) and the second 
integral is O(log 7’) as is known since it is the Lebesgue constant. Thus 


(8.11) In = O(log T). 


Next J will be computed, being written as 


(312) n= [ff +f ff 
Dir (leitee+e3| <1/7) Di1° (86> |214+424+23|>1/T) 


We consider the integral over the domain Diy: | 2; | < 6/2, | 22 | > 6,|23| > 4, 
| 24 + ro + 23 | < Lt, 


(8.13) Tf. 


in place of the first integral in the right side of (8.12). The remaining part of 
the integral can be estimated in the same way as was done in J, , to be O(T). 
Thus we have 


[ff r{ff yk ty De 
Dit Pia | I] i 


lA 


af sin Tx; sin T’z2 sin T’ x; 
T — dx, ——<—<—<—$S s|_—— diy dz 
|2,|<6/2 | |zal.lz3|>6 


ry 2 r3 


IIA 


ja; +zq+z3|<1/T 


sin = sin T’x2 sin T’ X3 
ar [ — | dx ff | —-———— dig diz 
|z1|<8/2 r2>5,23<—8 | X3 


2 
|\2y+zqte3|<1/T 


\| 
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since 2 > 6, £3 > 6,| a: + x2 + 2;| < 1/T is impossible for large T. The last 
integral does not exceed 


: sin Tx © dx dx 
ar [ : | dx, [ — a 
}21|<8/2 v1 6 Le J— 


zo—z,-1/T 3 


sin Tx, r 1 (+3 2 1 ae 
2T —id — | 1 
jal<s/2} x 6 2 - + Taw+x,—1/T 


ae sin Tx, vi: die 
Cz — | daz 
|21|<6/2 r " 1 1222 —-é§— 1/ 'T) 





ll 








lA 


IA 


cf nae sin 121 | az, = O(log 7). 
ZaI1< | } 


vy 


Hence we get that the first integral of the right hand side of J is O(T). 


We next consider the second integral of J in (8.12), which is not greater 
than 


Tx | 1 ] 
(8.14) 2/ sin TH | 3, id. ioemapapsiniteanainanninin tain tiie J 
|x| <8/2 : r3<-8 (a: + 22+ %)%%\ 


<|21+22+23/ <8 





The inner integral, by a change of a variable, becomes 
[| dx, dx; 
z9,23>6 | ty + re —23 | Le X3 
1/T < |2\+22—23| <6 


which is not greater than the sum 


| © dxe | " dz; [s dz mf drs 
+ 
& Xe Jzy4zot1/7 (Xe — Xi — X2)Xs z 


i+z2—-1/T >2z3 (ay +2&- - 23) 23 





/ 1 
$C log T d 
. os x2(21 + x Ze ) 8 (1 + 2) e 


This is easily proved to be O(log 7’) and hence (8.12) is O(log’ 7), Lebesgue’s 
constant being involved. Hence we get 


= O(T) + O(log’ T) = O(T). 


Inserting this result and (8.10) into (8.9), we have shown fffo, = O(T). 
Similar arguments show that 


[ff = O(T) + O(log T log T’) 
Deo 
and 


[ff = O(T) + O(log T log T’). 
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The domain D, and D; are defined analogously to D, and the above estimates 
are easily verified. Combining these results, we get 


(8.14) I, = O(T log T’). 


Lastly we shall consider 7;. We shall treat, for instance, the integral over 
Dy . that is, 


* 4s ' 
[Tf - If] sin T( 2) Il Ge oe 8 dx, dxz dx; 
Die [21 | <8,|z2| <4,|zg/>8 | > zi jm 2: 


Bh wise 211 <5,| 221 <5,5<|23| < 35 


Replacing | x; + 22 + 23| by $|23|, because of 


(8.15) 


[a1 + t2 + 23| > |223|-—|u|—| | > 3/2], 


we see that the first integral does not exceed 








2 jain 7 ae / ee, 
lni<s| |z2| <6 | 4 |z3|>38 
This is clearly 
(8.16) O(log T log T’). 
The latter integral of (8.15) is not greater than 
: : , . - 
1 Seman Tx, dz, —- 2 Tis de, [ sin T(a + T+ 2s) dz; 
6 Jiz <a | v1 | |z2l<8 re } 58 < 231 < 38 | 1 + Xe + 23 
. . , ! | = 
< 1 sin T'x; de / sin T’x. dx, [ sin Tu du 
OJizni<cs| M1 | le2l<s te | [ul <58 u 


O (log’ T - log T’). 


We have thus reached fffp,, = O(log’ T log T’). The other integrals in J; may 
be shown by similar arguments to be O(log’ T log T’) or O(log’ T’-log 7). 
Hence, combining these results, we get 


(8.17) I; = O(log’ T log T’ + log’ T’-log T). 

By (8.8), (8.14) and (8.17), we finally get J = O(T log’ 7’). 
9. Proof of Lemma 7.4. Let 

(9.1) le(z,y,2)| SCC |z|+lyl+ zl) 


in S(6):|z|,|y|,|2| < & Let M be the upper bound of ¢(z, y, z). We parti- 
tion the integral as 
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[fl le(z, y, 2) @ sin T(z + y +z) sin Tx sin sin Py sin Ta dy de 


atyte2 
-fff,+ +fff.- Jit ds, 


uff ‘sin T(z + y + 2) sin Tz sin T’y sin T’2| 4, a a, 
R3—8(8) z 


say. Then 


zt+yte x y 





lA 


| Je| 


(9.2) 


O(T log’ T’) 


by Lemma 7.3. 
Next, inserting the relation (9.1), we have 


misc fff del+ivi+lep|@tetet 
































(9.3) ee 
, sin Tx sin T’y sin T’z| ie ae). 
x y y 
We consider, for instance, the following part of this integral: 
jas + y +z) sin Tz sin T’y sin T’z 
ff ’ rt+tytz x y 2 jee 
< sin T(x + y + z) sin Tz sin Pe poet 
Jif] en z 7 
= [| sin 8) a [| | sin mesa af jsin si cll 2)\4 
3 ttytz 


ae |. 
sin T'x | sin T’z sin Tu | 

s [|| ae f inl | sin Tay 
2 x é z | 35 u | 


O (log’ T log T’). 


Other parts in (9.3) may be estimated by similar arguments to be 
O(log T log’ T’) or O(log’ T log 7”). 
Keeping (9.2) in mind, we get the proof of Lemma 7.4. 


10. Proofs of theorems. We are now in a position to prove the theorems in 
Section 6. Throughout this section we of course assume all conditions stated in 
these. 

First of all we shall evaluate EJ*(T). 


EJ?(T) = — 5H Berit. &(s)8(t)8(u)8(v) eo POE ds dt du dv. 
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Inserting (6.2) here, we have 


ED(T) = mr lL Q(t — s,u — s,v — 8) 


(10.1) -e 1G-9-@—1E ag dt du dv 


+ i603 tT BILE. Po(t — s,u — s,v — sje" 9-@E Fog dt du dv 


say. Because of (6.4), we have 


1 1 \?? % 
J, = lore? (4) [fl q(x, y, z) dx dy dz 


- 
{Tf eg atrtet® | ite +iny+O+iv(e—) di du & 
Lr 


~~ i - sin T(x + y +2z+ €) 
(Yap ll Leon etetyset 


sin T(x + &) sin T(y+é£) sin T(z — €) 
: > ————————_. da dyd 
r+é y+ fr at 


- sin T(x + y + 2) 
-(i) 3 eT? {fee q(z — & y — & 2+ &) —— 


, sin Tz sin Ty sin Tz 
x y 





(10.2) 





dx dy dz. 


Now the integrai (10.2) with q(z, y, z) = 1 is easily shown to be 2°7' by 
the repeated applications of Parseval’s relation. Hence we have 


Jy = (4) Saf [ tae —t&y—&2+8) — 4(-& -£8)} 


_sin T(z sin T(x + y + z) sin Tz sin sin Ty sin Tz 
z 


(10.3) ~seeae° oS ; 











dx dy dz 


+ 9(—&, -£, £) 3) 


Since q(z, y, z) satisfies the Lipschtz condition at the point (—£, —é, &), we 
have, by Lemma 7.4 with T = T’, 


a 1 
J; = Ol =—log’ T O{ = 
(10.4) (¢ -~ ) + (7) 
= 0(1). 
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Next 


nm er lI, 


{p(t — s)p(u — v) + plu — s)p(v — t) + p(w — s)p(t — u)} 
-e GO-GO) ds dt du dv. 


(10.5) 


Here we shall have, for instance, 


T 
(10.6) mr [Tf fo — s)p(u — ve t*-P-E ds dt du dv — p’(é), 


as T— a, 


because this will become, inserting p(u) = f°. p(x)e™” dz, and making a change 
of orders of integration, 


°° 2 T 
ae | p(x) de [ p(y) dy fifi eft OEtO+ie-OGt® ae ds du dv 
16x? T? Le « T 


- rc pe ) sin’? T(x + E) ae 2 f ate ) ete dh 


Ta +H “ 
which tends obviously to p'(—£) = p’(£) as T — @ by the well known prop- 
erty of Fejer’s integral. 
As for the second part of the integral of the right hand side of (10.5), we 
obtain 


T 
(10.7) ise [fff p(u — s)p(v — t)e 9 “) ds dt du dv — p'(), 
as T— o, 


However we find a difference in considering the remaining part. In fact the 
integral 


ae ff p(v — s)p(t — ue I"! ds dt du dv 


becomes, after similar treatments, 


sin T(x + £) sin T(x — &) ] 
[f. Ms) G+ Oa -o 


which converges to p’(0) if § = 0, and O if § ¥ 0, by Lemma 7.1. Hence com- 
bining this result with (10.6), (10.7), we get 








lim J; = 3 p*(0), if = 
T?2 


= 2 p'(é), if t <0. 
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We finally get 
(10.8) lim EJ*(T) = 3 p*(0), if ¢ = 0, 


T?-2 


= 2 p'(é); if + 0. 


Moreover we shall find out here the limit value of EJ(T)J(T’). The same 
method as above leads us to 


T 7 
sia E I{ &(s)8(t)e "* ds dt Il &(u)8(v)e—”* du dv 
162°T7 Tr T 


T T’ 
2 arr IL, me IL. du dv-Q(t — 8,u—v,v — sje tees 


I ; ¥ —t| (e—-f)—(u—-9 
+ ioe TT" If. ds dt If. du dvP g(t — 8,u — 8,v — sje “PE 


K,(T, T’) + K2(T, T’) 


(10.9) 


say. By the same way we got (10.2), we shall have 


K,(T, T’) = (3 =) ar i: q(x — & y — &2 + &) 


_sin T(z + y + 2) sin Tz sin T’y sin T’z 
r+tytz x y 





dx dy dz. 


Now the Parseval relation proves K,(7T, T’) with q(2, y, z) = 1, to be 


(10.10) fe = ae ¥ 2 == sin 2 y sin ae dy dz = +°T, 


if T’ > T. Hence we have 


K,(T, T’) = (zx — & y — & ¢ + €) — a —&, —% &)} 
() ere SIL 


_ sin See desde 
etryte2 x y Zz 


1\°? x 
+ (1) qi —& —§, £) 


O (rp (log’ T log T’ + T log’ T’) + n): 





which is, by Lemma 2, 


Hence 


(10.11) K,(T, T’) = 0(1), asT’ >T— ~. 
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Next K2(T, T’) will become after some arguments 


DR 38 7. [E —i{ (e-)—-(u—v) | 
ioe? TP / fi ds dt = du dup(t — s)p(u — v)e 


T T’ 
+ al ds dt [I du dup(u — s)p(v — the te-9— one 
T T’ 


T TQ 
+ fl ds dt [/ du dup(v — s)p(t — ue ew 
—7 _7? 


= Ky + Kun + Kas, 


say. It will be seen that the asymptotic behaviors are much different among 
Ku , Ke and Ko; . 
In fact Ky, becomes, by an argument like that used in considering (10,6), 


* sin’ T(x + £) p(z) ax [ sin’ T'(y - 


- Tet) Py oF MO) a 
which tends to p(t)p(—£) = p*(£) as T, T’ > o. Thus 


(10.12) lim Kx (T,T’) = p*(é). 


ad 2T-o 


Next we see that Kx(7', T’) will become 


p(x) dx 








1 [ sin T(x + €) sin T’(y + €) 
« (x + &)? 
, - sin T(y — &) sin T’(y — &) 


— ~ iy — oF a p(y) dy. 


Then Lemma 7.2, (7.2) shows 


T'>T+2 T 


(10.13) lim Ka, T’) — p(t) ry = 0 


Finally K.; becomes, after some arguments, 


11 sin (x + £)T sin T’(x — £) r 
PIT ts @+d@-9 in| 


which converges to 0 if § ~ 0, by Lemma 7.2, (7.3). On the other hand if § = 0. 
then it reduces to Ky and 





lien {Kul T’) — p*(0) T} =o . 


T' >T+0 
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Combining (10.12), (10.13) and the last result, we have 


(10.14) lim (Ka, T’) (1 + Ph) ro} = 0, if t <0, 
T'’ =T+0 1 

(10.15) lim {KAT T) (1 + 7) 7(0)s =0. 
T' >T+o 1 j 


Hence putting (10.11), (10.14) and (10.15) into (10.9) we get (6.9) and (6.10). 
After these preparations, the proofs of the theorems are now very easy. For 


’ 


E\u(T) — J(T")}? - ( - r) 2p*(£) 


= (EJ*(T) — 2p*(é)) + (EU*(7") — 2p*(é)) 


—2 (enna - ( - i) p'te)), 


which tends to zero by virtue of (10.8) and (6.9). Formula (6.6) is also proved 
using (10.8) and (6.10). 
The proof of Theorem 6.2 is also immediate. 
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LARGE EXCURSIONS OF GAUSSIAN PROCESSES 


By Mark Kac anp Davin SLEPIAN 
Cornell University and Bell Telephone Laboratories, Inc. 


1. Introduction. It is known that the problem of determining the distribution 
of spacings between consecutive a-values of an ergodic Gaussian process, z(t), 
(Ex(t) = 0, Ex*(t) = 1) is very difficult. Recently Palmer [1] and Rice [2] 
treated some limiting cases of this problem. In one limit they determine, for 
a — ©, the conditional probability 


(1.1) Priz(r) > a, OS 7 S tO(a)|x(0) = a, 2'(0) > O} 


where @(a) is the average length of the times spent by x(t) above the level a. 
Apart from some differences concerning the meaning of the conditional proba- 
bility (1.1) both authors use the following heuristic device. 
Since for large a, 6(a) is small, they write 
z”(0) 2 


a(r) =a+2(0)rt+ os 


and take for the time of the first downward crossing of the a-level 


; x’(0) 
(1.3) Aetieoi f 
It would thus seem that this procedure is limited to processes for which x” 
exists. This would exclude, for example, the displacement of a harmonic oscillator 
in Brownian motion. It is precisely this point that led us to undertake the present 
investigation. 

We have found an alternative derivation of the Palmer-Rice results which 
does not depend on the approximation (1.2) and hence is applicalle to all cases 
of physical interest. We have also attempted to elucidate the ambiguity of (1.1) 
(see §2) and we have in §3 shown in what sense the sample functions z(7) are 
approximated by parabolas as suggested in (1.2). 


2. Conditional prohability densities. It is well known that conditional prob- 
abilities and conditional probability densities must frequently be treated with 
some care. Since the material to follow contains some excellent examples of the 
subtle nature of these quantities, a few words on the subject are in order here. 

Let x(t) be a continuous ergodic Gaussian process possessing a derivative 
almost everywhere. Consider the “conditional probability density for the slope 
£ = 2x'(0) given that 2(0) = a.” From the ensemble point of view, the phrase 
in quotation marks has no meaning, since the set of sample functions satisfying 
the condition z(0) = a is of probability zero. Yet, given a sample function of 
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the process, one can imagine observing the slope of x(t) at each value of ¢ for 
which z(t) = a and one can thus obtain an “empirical or time derived proba- 
bility density for 2’(t) given that z(t) = a.” This probability density will be the 
same for almost all sample functions. How do we reconcile these two points of 
view? 

From the ensemble point of view, one can, of course, give meaning to the 
“probability density for § = x’(0) given that z(0) = a’’ by means of limiting 
procedures. The condition z(0) = ais replaced by some condition, A, of positive 
probability depending on parameters. The condition is chosen so that as the 
parameters assume limiting values, A becomes the condition x(0) = a. The 
conditional density of — given A, p(&|A), is computed; the limit of this 
quantity as the parameters assume their limiting values can then be taken as a 
definition of p(é|2(0) = a), the “density for — given that 7(0) = a.” 

Unfortunately this limit depends in general on the manner in which A ap- 
proaches the condition (0) = a. We illustrate with a few examples. 

(i) Let A bea S x(0) S a + 6. Then 


a+é : 
J / plé, x) dx 2 
p(t\z(0) = a)v.w, = lim “2__ _ 


ws ee. ~<a 
[ p(x) dx V/2ra 


where the subscript v.w. stands for “‘vertical window,” p(é, x) is the joint density 
for — = 2’(0) and x(0), and p(z) is the density for z(0). We have made use 
of the independence of x(0) andé and haveassumed that Ex(t) = 0 and Et’ = a. 
This vertical window definition of the conditional density of § given that 2(0) = a 
thus reduces to the conventional one p(é|z(0) = a) = [p(&, x)/p(x) rma -- 

(ii) Let A be the “horizontal window condition” x(t) = a for some ¢t such 
that 0 s ¢ Ss &. Then if — = 0, 





plé | x(0) = @)n.w. 


(2.1) [ p(é, x) dx 2 
li s o—t ate 5 e 24 
= lim - 9 


s dt’ fe de ple’, 2) + [we [OV aeve,2) val 


since the condition A can be satisfied (to first order in small quantities) for a 
given value of slope, say ¢’ > 0, only if a — #6 S x(0) S a. A similar calcula- 
tion for § < 0, gives the final result 





Sie 
p(é | 2(0) = a)aw. = LE te 
a 


(iii) More generally, let A be the condition that x(t) pass through a line seg- 
ment of length 6 and slope m having one end-point at x = a, t = 0. Then one 
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finds by straightforward computation 


et 
elt) 0) @ a), @ ee /V ra. 


2 Kae lf ae d 
—e m x 
—m 
(iv) If A is the condition that x(t) pass through a circle of radius 6 with center 
at the point x = a,t = 0, then 


£2 
(¢|2(0) = 6) ai AED are 
Vite is de 


Which, if any, of these several versions of p(t|z(0) = a) is equal to the 
empirical density obtained from a single sample function? The question can be 
answered readily in the following heuristic manner. Let » be the expected num- 
ber of zeros per unit time of x(t) — a. Let S,(y) = 1 if y S b and be zero other- 
wise. The empirical cumulative distribution for — can be written, 


Pr (¢ S b|2(0) = 6)exe = lim * afx(t) — a] | 2'(t) | Sola’(t)] at 


T+2 


a E> df(t) — al | x(t) | Sule’(t)) 


~ [ae [aCe — a)|&| pl, x) 


~ [_ae\€\ pg, a), 


Here we have appealed to the ergodic theorem. The empirical density for & then 
follows by differentiating with respect to b, 


(22) p(é|2(0) = a)em> = +|&| pg, a). 


Now the denominator of (2.1) is the probability that x(t) — a@ have a zero in 
the small time interval 6. Evaluating the integrals, one finds this probability to 
be +/2a/mp(a)é to first order in 6. It follows then that v = +/2a/rp(a). Insert- 
ing this value in (2.2) yields 

£2 


p(é|2(0) = a)emp = ee e 2 = p(t|2(0) = a)anw.- 


It might be mentioned that the interpretation of conditional probabilities in 
the h.w. sense is intimately connected with the definition of the mean recurrence 
time as introduced into statistical physics by Smoluchowski. 
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Let x(t) be an ergodic process and consider the discrete observations x(0), 
a(r), x(2r), --- . For these observations the mean recurrence time of the state 
defined by | z(t) | < 6 is given by Smoluchowski’s formula 


1 — {Pr {| x(0)| <8} 
"Pr {[2(0)| <8,[z(r)] >3} 


which he derived by direct time average considerations. (For a derivation of this 
formula as well as a discussion of its connection with the ergodic theorem see 
[3].) Denoting by W(x) the probability density of z(t) and by W,(z, y) the joint 
probability density of x(t) and x(t + 1), we get 


03,+ = 


_ F W(x) dz 
. [/ W,(2, y) Rei: 


|zl|<8 
lul>é 


For a Gaussian ergodic process for which z’(t) is defined, we can go further. 
Since 


03,, = 


a(r) ~ x(0) + rz’(0) 
and 2(0) and z’(0) are independent, we have 


[/ W.(2, y) dx dy ~ =i [[ « -(F+ i) dx dé 


jz|<8 Iz1 <8 
lul>é letrél>s 





where a = E[z’(0)|’. Now 


A$ #) 2 _# .  — e 
[fe tte) dz dg = [ dxe 7 s—2 =e de * ” + [ dze * [ dte 2, 
jz <8 , z 
|jz+rE| > 
In the first of these integrals set x = 6 — yr. In the second, put x = —é — yr. 


There results 


~ ee 
| dye? i dte ia 54 Tf 2% ate 
0 


te 


[ dte ss 


Ne ip caitlin Biden ra 


and hence 


5+0 10 —-[< [ d = = [ “he 
w/a b y , fe 5 ye y 


which agrees with the known result of Rice for the mean distance between zeros. 


3. Joint distribution for large positive excursions. Let x(t) be a continuous 
parameter ergodic differentiable Gaussian process with mean zero and covari- 
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ance function p(r). For convenience we choose p(0) = 1 and assume that in 
some interval about r = 0, 


(3.1) pr) = 1 — 57 a (7°), a> 0. 


Let 6 = 6(a) denote the expected length of the intervals during which x(t) = a. 
Then [2] 


(3.2) hf tal 


and 
(3.3) 


for large positive a. 
In this and the next several sections, we study some limiting properties of 
the related process 


(3.4) A(t, 0) = a . 


as a— +, (or, equivalently, as 6 — 0 through positive values). We shall gen- 
erally be concerned only with properties of A(t, @) conditioned by 


a’(0,@) = 24) =o and a(0,6) =0 
Ot \tmo 


in either the h.w. or v.w. sense of Section 2. 

The main result of this section is that, as a — , the n-dimensional joint 
distribution function of A(t, , 6), A(t, 6) , --- , A(ta, @) conditioned in the v.w. 
sense by A’(0, 6) 2 0, A(0, 6) = O approaches the singular n-dimensional half- 
normal distribution function of the random variables 


(3.5) A; = -4/% i + Vatié, 


where £ is a random variable with probability density 


0, E<0 


as) HOTA, ane 


If the conditioning is done in the h.w. sense, the result remains the same except 
that — now has the Rayleigh density 


E<0 


(é) * 
. te ? é20. 
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In one sense, then, as a + «, the sample functions of the conditioned A(t, @) 
process become a family of random parabolas, 


= 4/ Eb + Vali 


where £ has either a half-normal or Rayleigh distribution according as A(0, 0) 
is conditioned to be zero in the v.w. or h.w. sense. In terms of the original x(t) 
process, one can say that, when properly scaled and normalized, the excursions 
of x(t) above the level a approach parabolas as a > +. 

It is worth noting that these results require only the existence of the first 
derivative of x(t). Processes with pathologies only in higher order derivatives, 
such as the harmonic oscillator of physics, are sufficiently “tamed” by the 
normalization and scaling indicated in (3.4) to give the limits mentioned. 

We obtain the limiting conditional distribution function for A(t, @), --- , 
A(t, , 8) by computing the characteristic function, g.(m , m2, --- , m), for these 
quantities and determining the limiting function 


(3.8) A 


¢(m,***, mm) = lime.40¢a(m,°** 5 Mn). 


By a well-known theorem ([4], p. 102), ¢(m, «++ , mm), if continuous at m = m2 == 
+++ = m, = O, is the characteristic function of the limiting distribution function 
for A(t, 0), --- , A(ta, 8). 

Let € = 2’(0), x = x(6t;),i = 0,1, 2, --- ,n, and & = 0. Then 


p(x, s+, anlé = 0, Xo ae 6) vier. 


[ dép(é, To, T1i,***, tn) logue 


$p(2Xo) lso=e 
(35.9) - 
I dép(x1, -++5 In| &, ya = a)p(é, To) |zo—a 
0 





$p(Xo) lzoma 
-2/ dtp(1, +++, %n|& to = a)p(é). 


Now p(a1, °-* , 2al€, 22 = @) is an n-variate Gaussian density (see Appendix). 
The conditional means a: | covariances are readily computed: 


(3.10) m, = E(x;\§ 2 = a) = p(Ot;)a — ~ p'(6t,)é 


and 


Ay = El(x; — m;)(aj — m;)| 2% = a] 


(3.11) 


ola(t; — t;)] — p(6t;)p(at;) — | p’(ot;)p’(6t;), 
a 


,j,= 1,2, -++,M. 
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One can write, therefore, 
oo oO ' 
p(n, oo. Ze | t, 2% = a) = (24) [ dm -:: [ digg 7 eID AZ Ninn, 
2 o 


Introduce this expression into (3.9), substitute z; = a + 64; and multiply the 
entire expression by 6” to obtain the conditional density function for 
A(t: , 6), --+ , A(t, , @) in the form 


pildi, ---,4,|A’(0, 0) = 0,A(0, 6) = Olv.w. nj 


——_ d. [ ad d ez rilatea; —mj)—4EAjanjme e 2a 
-arl ' ~; f aT V/ 200 


Let § = Vat’, On; = n , i = 1,--+, n. Introduce the value of m,; given in 
(3.10), interchange the order of integration which is a step easily justified, and 
omit the primes. There results 


pid, --+, A, | A’(0, 6) = 0, A(0, 0) = Oly.w. 


“ef rm tS yd; ae 
(2x)" . dm [. dn, ¢a(m, ’ Mn) 


where 


ga(m,***,%) = exp | ~i , i 15 U — p(6t;)] — 4 — 


ots ye! ee * 2 


On using (3.1), (3.3) and (3.11), one finds, 
rave? ¢* - i 
o(m,°*->%) = limele,:-::,e)le¢ %!. l TES pbnit gta 


But this expression, which is continuous at m = --- = 9, = 0, is just the zhar- 
acteristic function associated with the random variables (3.5), (3.6), as a trivial 
computation shows. 


The determination of the limiting form of the joint distribution function for 
A(t, , 0), ..., A(t, , 6) conditioned in the h.w. sense by A’(0, 6) 2 0, 4(0, @) = 0 
proceeds in a similar manner. Here (3.9) is replaced by 


p(t, °°+,tn\§ 2 0,20 = @)nw. 
l dtép(&, oo, %i,**" » Za) logue 


f dgtp(&, 0) |eyme 


* V=| dttp(xi, +++, 2n|& 20 = a)p(é). 
Qa 


The remaining steps are as in the previous demonstration. 
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4. Asymptotic distribution of first return time to positive level. We now as- 
sume that 


(4.1) p(r) = 1 —srt+s r+ art + o(r'). 
Let P.(T) be the probability that A(t, @) be non-negative for 0 s ¢ = T 
given that A’(0, 6) = O and A(O, 6) = O in the h.w. sense. Then Q,(7) = 


—(dP,/dT ) is the probability density for the duration of the positive excursions 
of the A(t, @) process conditioned in the h.w. sense. In this section we show that 


of Si" T20 
(4.2) Q(T) = lim Q(T) = { 2 ” 
arn | 
\0, T<0 
and 
72 
( ey T20 
(4.3) P(T) = lim P,(T) = 4 ’ 
os i , Fee 
If the conditioning is done in the v.w. sense, the corresponding results are 
i—° ee 
(4.4) Q(T) =< 
0 ‘ 7 <@ 
( Y 2g 
1-[ edz, Teo 
(45) P(T) = ° 
0 » F126 


These results are consistent with the interpretation of the sample functions of 
the limiting A process as the family of random parabolas (3.8) with ~ distributed 
according to (3.6) or (3.7). Note that the results (4.2)—(4.5) are independent 
of the parameters defining p(7). All differentiable ergodic Gaussian processes, 
when scaled as here, have the same asymptotic distribution for the duration of 
excursions above a level. 

To compute P,(7), we make use of the method of “inclusi~n and exclusion” 
[5], p. 89, in a manner analogous to that of Rice [6], p. 70. Let A; be the event 
“‘x(t) assumes the value a for some value of ¢ such that 


u(T/n) St < (¢+1)(T/n) 


given that z’(0) = 0 and z(0) = a in the h.w. sense.” Then the probability, 
W.(T), that z(t) be not less than a for 0 Ss t¢t S T given that z’(0) = 0 and 
x(0) = ain the hw. sense is 


W.A(T) =1— >} Pr[Aj + > Pr[A4; NA, > Pr[A;NA;NAJ+--- 
i i<j t<i<k 
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In the limit as n — ©, this becomes 


1 T 1 T T 
W.(T) =1- af dt; pi(t:) + xf at, Ate po( ty , te) 


if Tr Tr 
-if at, f at. [ dts pst , te, ts) + °° 
3! Jo 0 0 


where 
pi(ty, +++, tj) dh +++ dt; 


is the ear that x(¢) assumes the value a in each of the intervals 
(4,4 + dt),--- , (é;, t; + dt;) given that 2’(0) = 0 and z(0) = ain the h.w. 
sense. 

One has then, since P,(7T) = W,(@T), 


; 6 T e T T 
(46) P(T)=1-— nd dty p(t) +5] at, | dt p2(6t; , Ot) — ++ 


and by differentiation 


(47) QT) = op,(0T) — — [ dt, px( 64, ,0T) + - 
Here 


Dn (Oty 9" F » tn) 


(4.8) [ diy [dt ~++ | dtnte| fl --~ 1&6] PC E0, «+ Busey +++ yn) lene 


/ 2 P(x0) |zoma 


i= x(6t;), &; = x’ (6t;), lo = 0, 1 = 0, :. 29 


and p denotes the joint density of the random variables indicated. 

From the derivation of the method of inclusion and exclusion, the successive 
partial sums of (4.6) and (4.7) alternately over-estimate and under-estimate 
the limit sum. Therefore 


(4.9) Os = PAv) —1 a of dt, pi( Ot; ) s i an [ dt, Pol Ot; , Ot2), 


(4.10) 0 < Q(T) — op,(eT) < ff dt, p(t, , 0). 


We establish (4.2) and (4.3) by evaluating lim... 0p:(64) and by showing the 
right members of (4.9) and (4.10) approach zero as a —~ ©. A completely 
analogous procedure gives (4.4) and (4.5). 


‘ 
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To investigate the behavior of @p,(6t,) for large a, we write (4.8) forn = 1 as 


(4.11) 0(0t,) = 4/2 0px | 20 = a) |aoe Ie, 
(4.12) f. = [ dé ‘ dg, £o | &, | p ko, & | Mm =—- 4,%3 = a). 


An elementary calculation shows that 


2a 1 ee e i sit 
04/7 oa = Dene = | eI 


where p = p(6t,). Using (4.1) and (3.3), there results 


* 43 

‘ cS 
(4.13) lim @ 4/7 va | X= a) —_ = € ° 
an a aly 


The factor p(£ , &:\%0 = a, 2; = a) in (4.12) is a bivariate Gaussian density. 
The conditional means are found to be (see Appendix) 





my = E(&| 2% = 21 =a) = — ——— = — Ei |a =u =a) = —m. 


1 + p(t) 
The conditional covariances are 
a — [p'(6t)]° — ap'(6t,) 
1 — p*(6t:) 
An = El(& — mi)" | % = x, = al, 


ro = 9 AGti)e” (tr) — p(t) — p(t) Lo" (64)I° 
. 1 — p*(6t:) 


hoo = E(t — mo)* | v = 11 = al 





Asa— ©, m— Vanr/2h ,m — —Van/2t; ; the covariances are 0(6) if c; ~ 0 
and o(@) if c; = 0. By standard arguments, then, as a — © the contribution to 
I, comes entirely from the neighborhood of the point (mp, m) and 


aT 2 
> 
li I. ‘ies 2 hy, ty = 0 
— -. 4 <0. 


Combining this result with (4.13), we find 


7,2 


. rhe 
(4.14) lim @p,(0h) = 4g4°° » 420 
-— 0 ao “Qe 


To show that the right member of the (4.10) approaches zero, we write (4.8) 
for n = 2 in the form 


(4.15) @p2(Ots , Ot2) = p(x, 22 | to = @) |ema'Ja 


Zo=a 
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with 


J.=6 4/% 


[dof aes f és fe | & || & | plé, 8, &| t= = =a). 
0 « oo 


The first factor of (4.15) is of the form 


(4.16) 


a? 
—Zhe) 


p(a1,22|20 = @) lene = — 


zg—a Qn /d 
where 
d = 1 + 2p(6t;)p(Ol2)p[0(t. — th)] — p'(0t) — p'(Ot2) — p[O(te — t)] 


and 


h(a) = 2 A= eC) — (G4) {1 — l(t — 4)i} 
2 3 


A lengthy calculation shows that asa — «, 


5 acy tite(t. — t:)°6 +0(6°), c; ¥ 0 
(4.17) d= 
falce — atti te — 4) + of), ec 
so that 
(Sate 1 (:) 
a | Bes 6 o 0 , C3 x 0 
(4.18) 5 h(8) = 


ee 1 a 
ia-ae * (5). a om 


The first factor of (4.15) therefore approaches zero at least as fast as Aé’*e*”. 
The proof is completed by showing that J, is O(1/6') for some finite r so that 
from (4.15) @p2(6t; , 02) ~O0asa— ~. Now 


J./¢ f= 


( m e. 7 
(4.19) [aeof ae aes | Eo || &1|| & | p(go, fi, & | vo = a1 = m2 = a) 


lA 


lA 


8 E( et” | y= N= re = a) . 


This conditional expectation, however, is a multi-nomial in the conditional means 
and variances of the ¢’s. These latter quantities in turn are rational functions of 
a, p(6t,), p( te), p[O(t. — t)], p’( 0th), --- , p”[A(t2 — &)]. It follows then that the 
right side of (4.19), and hence J, also, is O(1/6"). 
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We note in passing that in the case c; = 0 the factor c — a’ in (4.17) and 
(4.18) is non-negative and vanishes only if p(r) = cos Br. This can be established 
as follows. From (4.1) it .s easily seen that when c; = 0, p”(0) and p“ (0) exist 
and are given by 
(4.20) —a=p"(0) =p (0). 


Now since p(7) is a covariance function, we can write 
wo 
2ridr 
p(r) = | é*'" dF(\) 
=) 


where F(X) is non-decreasing. It follows then ‘ee [4] p. 90 for a similar argument 
involving a distribution function and its characteristic function) that the second 
and fourth moments of F exist and that 


Il 


p’’(0) -—a = —4x° [ \? dF(A), 


op’ (0) =a = 16r* [ \* dF(A). 


The Schwartz inequality then gives 


2 


C4 a 


IV 


with equality only if p(7) is of the form cos 87. Our derivation of (4.2)—(4.5) fails 
in this case. Indeed, we have already excluded this process with covariance cos Br 
‘rom consideration since it is not ergodic. The results (4.2)—(4.5) are still valid 
for this process, however, as a separate calculation, omitted here, shows. 


5. Asymptotic distribution of first return time to negative levels. As in the 
preceding section, let Q.(7') be the probability density for the duration of the 
excursions above the value a of the A(t, @) process conditioned in the h.w. sense. 
If in addition to (4.1), p(7) and its first two derivatives approach zero as r > ~, 
then 

lim Q.(T) = 2e~°7. 

This result follows readily from (4.7) and (4.8) and the asymptotic formula 

for @ for large negative values of a, 
a2 


2r = 


- ’ 


a 


@~ 





obtained from (3.2). For large negative a, the random variables zr» = 2(6t;), ---, 
Ln = 2(Otn), bo = 2’(Olo), --- , & = n’(Ot,) tend toward independence and the 
density in the numerator of (4.8) approaches 


1 2 (n+l) 
g sert-"=_ * 





" 1 
(29) +1 nt 
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in a uniformly continuous way. One finds, then, 6"p,(6t; , --- , Ot.) —> 2” and 
the series (4.7) sums to 2e-?. 

If the conditioning is done in the v.w. sense, the same result is found. It is to 
be noted that this limiting distribution, like those obtained for large positive a, 
(4.2)-(4.5), is independent of the covariance p(r). 


APPENDIX 


The detailed calculations of this paper make frequent use of the multivariate 
conditional densities for Gaussian variables. Since these densities do not appear 
to be readily available in the literature, we present them here for the reader’s con- 
venience. They can be derived with a little effort from material given in many 
texts, e.g. [4] or [7], pp. 27-30. 

Let £1, --- , &. be jointly Gaussian with Hé; = 0, Fé; = d4j, 7,7 = 1,2, +--+ ,n. 
Then 


1” anil 
e “2 oil (Ei—my) (Ej—m;) 


PlEpir, °°" En | br, °° +» Sp) ci) eeomtt Teal 


where 
? . 
-m: = E(Eil&, +++» &) = 2 Baki, t=ptil,---,n 
j= 
Mii = E\(&; = mi) (§; a m;)\f , “f » &5), tJ - p+ 1, hee 
and 
pm tts Any Ane Anny vet Mp] 
Bij = d |: : : : : 
npn °°") Apeiay Apis Apes) *°* App 
‘ |i Na wire ip | 
aie 1) Aa Au nae Arp 
ene | + Sete ~ 
| Api Api . App | 
| An soo Ny | 
d=|: E41, 
Apt *'* App! 
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THE CAPACITY OF A CLASS OF CHANNELS' 


By Davin BuackweELL, Leo Bremman, anp A. J. THOMASIAN 
University of California, Berkeley 


1. Summary. Shannon’s basic theorem on the capacity of a channel is general- 
ized to the case of a class of memoryless channels. A generalized capacity is de- 
fined and is shown to be the supremum of attainable transmission rates when the 


soding and decoding procedure must be satisfactory for every channel in the 
class. 


2. Definitions and Introduction. For any positive integer n and any set @ we 
denote by @™ the set of all n-tuples (x , --- , 22) with each 2; € @. 

A channel, denoted by (@, ®, P(y|x)) or by P(y| x), consists of two finite 
sets @, ® having a 2 2, b 2 2 elements, respectively, and a set of probability 
distributions P(- | x) on @, one for each z¢e@. P(y| x) is interpreted as the 
probability of receiving y ¢ ® given that x ¢@ was transmitted. 

The n-extension of a channel (@, ®, P(y|x)) is the channel (@™, @”, 
P(v|u)) where v = (y1,-°:, yn) EB”, wu = (m,°-:, tn) €@™ and 
P(v|u) = II Plyi | ;). 

When considering a class of channels, (@, ®, P,(y|2z)) for ye @, where @ 
is an index set, we shall always assume that the @, @ sets are the same for each 
channel in the class. We shall sometimes denote such a class of channels by @, 
the index set. 

A (G, €,, n) code for a class @ of channels for G 2 1, e, 2 0, n a positive 
integer, is a sequence of [G] distinct elements of @”; wu, «++ , ure); where [G] 
is the largest integer < G, and a sequence of [G] disjoint subsets of ®"”; B, , - -- 
Big); such that 


, 


P,(By\u;) S «& for i= 1,---,(G) andall yee. 


The set {u;,--- , Ure} is called the set of input messages of the code and B; 
is called the decoding set for u;. We think of an input letter u; of the code as 
being selected arbitrarily and transmitted over an unknown one of the channels 
P,,7€ @. The letter v is received with probability P,(v | u) and if v e B; it is 
decoded as u; . Thus, the probability is S «, that any input message u,; will be 
transmitted so as to be not decoded as u; ; regardless of which channel in the 
class € is used. 

An R 2 0 is an attainable transmission rate for a class € of channels if there 
exists a sequence of (e”", €,, n) codes for @ with «, — 0. Since @ has only 
a” points we know that any attainable rate R < log a. Clearly 0 is an attainable 
rate for any class of channels. For any class of channels € we define T = T(C) 
to be the supremum of the set of attainable rates for C. 
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If (@, @, P,(y| x)) for y € @ is a class of channels and Q(z) is a given prob- 
ability distribution on @ then for each y & @ we let P,(z, y) = P,(y| x2)Q(2) 
and we define on @ X @ the random variable J, by 

P,(2, y) 


Sola, y) = lo BSP (y) 


= 0 if P,(z,y) = 0. 





if P,(z,y) >0 


The dependence of P, and J, on Q will usually not be exhibited. Since we will 
often be interested in expressions of the form z log x it is natural to 
define log 0 = 0. We will denote the expectation of a random variable X with 
respect to the P, distribution by Z,X. If @ has only one element we may drop 
the subscript y. Finally for any class @ of channels we define the capacity of the 
class € by 
C(e) = C = sup inf E,J, 
Q(z) vee 

where the sup is over all distributions Q on @. 

In the case considered by Shannon, @ has only one element and our formula 
reduces to C = supg EJ, which is the usual formula for the capacity of a memory- 
less channel. Shannon’s theorem then states that T = C. T = C, T S C are 
called the direct and converse halves, respectively. This theorem for a single 
channel has been proved in various ways and under various conditions by 
Shannon [12], [13], McMillan [11], Feinstein [6], Khinchin [9], Wolfowitz [14], 
Blackwell, Breiman, and Thomasian [1]. We will show that within the frame- 
work that has been set up 

T(e¢) = C(e) 


always holds true. This result follows immediately from Theorem 1 which also 
gives an exponential error bound for any rate R < C. 

THEOREM 1: Let (@, 8, P,(y | x)) for y € © be any class of channels. 

(a) For any integer n and any R > 0 such thatO = C — R S 1/2 there is an 
(e*", €, , n) code for © with 


_(c-R)? 
¢e, = Ae #* 


10 33 ab 
=| 2 ab | aed Bo Bes 


where 


(C — R} 


(b) For any integer n and R > C if e*" = 2 then any (e*", €,, n) code for 
C must satisfy 


9 
c+ 82 
OB bm ae. 
R — 082 
n 


The sequence of steps used in proving Theorem 1 will be outlined. Theorem 2 
presents a basic inequality, for a single channel, which is contained implicitly 
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in Feinstein [8]. This inequality is of independent interest since it gives the same 
bound for the maximum probability of error that Shannon [13] gives for the 
average probability of error. Theorem 2 permits a simple proof of T 2 C fora 
single channel. Lemma 2 shows that supg in the definition of C(@) can be re- 
placed by maxg . Theorem 3 gives an exponential bound on the error of a code 
for one channel, which depends only on a, b, (C — R)’. This is convenient in that 
the particular probabilities P(y | x) may not be known and, in any case, need 
not be computed with. Results related to Theorem 3 have been given by Elias 
[3] and [4], Feinstein [7], Shannon [13], and Wolfowitz [14]. 

Lemma 3 generalizes the inequality of Theorem 2 to the case when C@ has a 
finite number of elements, and Theorem 4 generalizes the exponential error 
bound of Theorem 3 to this case. 

Lemma 4 shows that for a given @, ® there is a large finite number of channels 
on @, ® such that any channel on @, @ is close, in several senses, to one of them. 
Lemma 5 shows that if a channel has a sequence of codes (e”", ¢,, n) with 
e, = e ”” for large n, with B > 0, then this same sequence of codes can be used 
for all channels in a certain neighborhood of the channel. This result justifies 
some of our attention to exponential error bounds. The technique of Lemma 5 
can also be used to get some similar results when the channel probabilities vary 
from letter to letter. 

At this point the direct half of Theorem 1 is demonstrated by approximating 
the class @ of channels by a certain finite set of channels @’ from Lemma 4; 
obtaining an exponential error bound code for @’ from Theorem 4; and using 
Lemma 5 to show that such a code must be satisfactory for C. 

The converse half of Theorem 1 is then proved. 

Before proceeding to the proofs we pause to clear up one point. It is obvious 
that 

C(¢@) Ss inf sup E£,J,, 
yee Q(z) 
i.e., C(@) S the capacity of every channel in €. We now exhibit an example 
where C(@) = inf of the capacities of channels in @. Let @ = @ = {1, 2, 3, 4}, 
= {1, 2}, and let Pi(y | x) and P2(y | x) be defined by the left and right follow- 
ing matrices, respectively. 


4 4 0 O/ /t 4 4 3 
00 4 $\[% 4 2 2 
t % % 3/\3 3 0 0 
+ 4 34 2/\0 0 3 3 
Let Q(x) be any distribution on @ and let HY) = —>o, Pi(y) log P.(y), 


HAY |X) = —>d. Q(2) d, Pily| z) log Pi(y | x). Using the fact that log z = 
(log 2)log, « we see that (log 2)"H,(Y |X) = Q(1) + Q(2) + 2Q(3) + 
20(4) = 1 + Q(3) + Q(4). Also from Feinstein [8], p. 15 we have 
(log 2) 'Hi(Y) S 280 that Eid; = HY) — HY |X) Ss (log 2)(Q(1) + 
Q(2)). Similarly F2J2 < (log 2)(Q(3) + Q(4)) so that C(e@) < (1/2) log 2. 
The case Q(7) = 1/4 fori = 1, --- , 4 shows that C(@) = (1/2) log 2; the case 
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Q(1) = Q(2) = 1/2 shows the capacity of channel one to be log 2; the case 
Q(3) = Q(4) = 1/2 shows the capacity of channel two to be log 2. Thus for 
this example 
} log 2 = C(C) < inf sup FL, J, = log 2. 
vee Q(z) 

3. A basic inequality. 

THEOREM 2: For any channel (@, ®, P(y|x)), any distribution Q(x) on @, 
a > 0,G 2 1 there is a (G, ¢€, 1) code for the channel with « = Ge * + P(J S a). 

ProoF: It is clearly sufficient to construct an (M, e, 1) code with the same « 
as in the theorem and with M 2 G. Let A = [J > al] and for any 2 €@ let 
A,, = {(2, y) | (to, y) « A}. P(J S a) S € 80 that P(A) 2 1 — «, hence there 
is an 2, such that P(A | 2) 2 1 — e. Let B; = Az, . (Each B, will be a cylinder 
set with base in @. The base of B, will be the decoding set for x, .) At the kth 
step select 2, such that P(B, | 2.) 2 1 — ¢€ where 


B, = 


Cr | 


k—1 
A., — U A,,. 
1 


This process will terminate at some M 2 1. For every x 


* ) 
P (4 -—AN (U Au) r) <l-e 
1 | 


otherwise we could add this x to 1, --- , 2 contradicting the definition of M. 
Thus 


P(A) =P (4 n (U A..)) +P (4 — AN (U Aw)) 


M 
s >> P(As,) +1 -—« 
i 
Now if (2, y) ¢« A then J(2z, y) > aso that P(y |x) > P(y)e*. For fixed x sum 
both sides of this inequality over all y such that (x, y)e A. Then 
1 = P(A|zx) 2 P(A, )e’. 


Thus P(A,) S e * for any x €@ so that P(A) S Me * +1 — . Since P(A) = 
Ge“ + 1 — «, we have M 2 G. Clearly the B, , --- , By are disjoint and 


P(B.|%) 21—e 


for k = 1, --- , M so the proof is completed. 

Consider a single channel (@, ®, P(y|x)) and let Q(x) be specified and 
determine P(z, y), J(x, y). Applying Theorem 2 to (a, @”, P(v| u)) and 
Q(u) = Q(x) --- Q(a,) with a = n(R + EJ)/2,G = e*” we see that for any 
R such that 0 < R < EJ there is an (e*", «, , n) code for (@, ®, P(y | x)) with 


én. = RIB? 4p (: i< +E) 
n 2 
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Now 
7 P(u, v) ‘ 
J ~v = cote taints 2 > ) 
(u,v) = log Pou) PO) if P(u,v) >0 
= 0 otherwise. 
Let J’(u, v) = >: J i(2;, yi) where 
a P(x; » Yi) : > . 
J (2, 5s) = 108 BCr)PCn) if [ (2; Yi) > 0 


= otherwise. 


Clearly P(J’ = J”) = 1 and J” is the sum of n independent random variables 
each having the distribution of J(2z, y). Since EJ > (R + EJ)/2 we see that 
e, — 0. Now it is easily seen (and we will shortly prove even more) that for a 
fixed channel EJ is a continuous function of (Q(2), Q(22), --: , Q(t,)) and 
since the domain of the function is a closed bounded subset of Euclidean space 
the supremum is actually achieved. Thus for any channel (@, @, P(y | x)) there 
is a distribution Q(2) on @ such that C = EJ. Using this Q(z) in the earlier por- 
tions of this paragraph we obtain the direct half of Shannon’s theorem for a 
memoryless channel: T = C. 

By introducing a brief epsilon argument in the proof of the direct half of 
Shannon’s theorem we could clearly have ignored the question of whether or 
not there is a maximizing Q(2). Although the fact that there is a maximizing 
Q(x) in the general case of a class of channels is not vital in the following work, 
we will pause to prove this fact now. The prooi is based on Lemma | which will 
be needed later. 

Lemma 1: Let Q(x), Q’(x) be any two distributions on @ such that 


| Q(z) — U(x) | S eS lVeforallze@. 
Then 
| H(X) — H’(X}| Ss ae” 
where H(X) = — >>, Q(x) log Q(x) and H’(X) = — >>, Q(z) log Q’(z). 
Proor: Let 
f(y) = [-(y + © log (y + €)] — [—y log y] 


where 0 < ¢ S l/eand0O Ss y S 1 — ce. Thenf(0) = — ce log « > O and f(1 — e) 
= (1 — e) log (1 = €) < Oalso 


nn. 
+e 


so that | f(y) | S max {— e log e, —(1 — e) log(1 — e)}. Now 


S'(y) = — log (y + ¢) — 1 + logy + 1 = log | <0 


(1 — €) log ; Ls a-o( t= 1) = 65 -cloge 
—e€ 


l—e 





1234 DAVID BLACKWELL, LEO BREIMAN AND A. J. THOMASIAN 


since « S 1/e. Thus 





since x” — log x = 2 — log 4 > O for x > O. Applying the result | f(y) | Ss 
1/2 


« toy = p,e = q— pwhere0 S pS q S 1land|q-— p| S 1/e we see that 
| [—p log p] — [—¢ log q]| = (|p — @|)"” 


which easily gives us the bound on | H(X) — H’(X) | completing the proof. 

Lemma 2: For any class of channels (@, ®, P,(y | x)) forye @, 

C = max inf E,J,. 
Q(z) vee 

Proor: Let (@, ®, P(y | x)) be a chaunel and Q(r) a distribution on @ deter- 
mining P(z, y) = P(y|2)Q(«) anda J(2, y). Clearly EJ = H(X) + H(Y) — 
H(X, Y) where H(X) = — >°, P(x) log P(x), H(Y) = — Dy P(y) log P(y), 
H(xX, Y)=- tee P(x, y) log P(2, 1;). Let Q’(x) be another distribution on 
@ determining P’(z, y) = P(y|2)Q’(x) and J’(z, y), and note that E’J’ = 
H'(X) + H’'(Y) — H’\X, Y) where the primed quantities have analogous 
definitions. Assume that | Q(z) — Q’(x)| S € S 1I/e for all re@. Clearly 
| P(z, y) — P(x, y) | S Ply| x) | Q(z) — Q(x) | S € and| P(y) — P’(y) | 
< >.| P(a, y) — P’(a, y) | S ae. Applying Lemma 1 we get 


| EJ — E’J'| <|H(X) — HY’ X),+|H(Y) -— HY) | 
+ |H(X, Y) — H’(X, Y) | 
ae + b(ae)'? + abe” < (a + 2ab)e”. 


lA 


Thus not only is EJ continuous in Q(x) but it is continuous in Q(x) uniformly 
in Q(x) and P(y | x). We easily take infyce on the inequalities 


E,J, — (a+ 2ab)e” ¥. E,J, < FE, J, + (a + 2ab)e” 

and see that infyce#,J, is continuous in Q(x) so that once again there is a maxi- 
mizing Q(x) and Lemma 2 is proved. 

4. The error bound for one channel. 

THEOREM 3: Let (@, ®, P(y|x)) be any channel. For any integer n and any 
R > O such thatO0 < C — R < 1/2, there is an (e*", €, ,n) code for the channel with 

(C—R)? 
é, = Qe 100 * 

Proor: Applying Theorem 2 to (@“, @”, P(v | u)) with Q(u) = Q(x) --- 

Q(2z,), where Q(x) is any distribution on @, G = e“", a = (R + @)n we see that 


for any R > 0, @ > O there is an (e”", «, , n) code for (@, ®, P(y | x)) with 
=e” + P(J” S n(R + 8)) 
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where, as shown in Section 3, J” is the sum of n independent random variables, 
each having the distribution of J(z, y). Select R > 0,0 s EJ — R S 1/2 and 
let @ = (EJ — R)*. ThenR + 6 < R+ (EJ — R)/2 = (EJ + R)/2. 

Thus it remains only to show that 


(BJ--R)® 


P(J” sn(EJ + R)}) Se 1 * 


(we will need this result later) for we can then choose Q so that C = EJ. 
A method due to Chernoff [2] will be used to bound the probability in question. 
Let 0 S$ t S 1, then 


n(BJ+R) tn(BJ+R) 
P (0 < wet» = s*) < Be'L 2 -1'*] =e ? Ee?" 


t(BJ+R) 


dei le 2 Ee “’|" 


so that we need show only that for a proper selection of t, 











t(BJ+R) (BJ—R)? 
Ee ey nS - 
e 2? Ee =. oo . 





Now 
Ee’ =1- tJ +5 Ese, 0<0<1. 


We need consider only (x, y) with P(z, y) > 0. Terms in EJ’e*” are of the 
form 


Hic Pry » Plz) 2p (3 ) 2_P(x,y) 
POV Bey ) 8 Papa) =P" Paw) * Pa@Pa) 





<= (P(2, y))*" log” pain S = 


where the last inequality followed from P(2, y) S P(x)P(y)/P(2, y) Ss 1/P 
(x, y). Also 


(P(x, y))'* log’ P(x, y) 





(P(e, 9) log PCa, w)F = (525) (Py) ¥ tog PCa, ¥)) FF 
2 \1 1 
. (; *.) @ 5 7-H 


— , t ab sense ome 
Ee’ 31 —tkhl + —- ——- Se 2 @-s)* 


20-7 


Thus 


so that 


t(BJ+-R) 
> —tJ —}f(t) 
a. 2 Ee se" 
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where 
ab 


ae". 
f(t) = (EJ — R)t - CG pi 


Let t = (EJ — R)/4ab S 1/8 80 that 1/(1 — t) /7)*, then 


(8, 
(= Rs (EJ — R)’ [1 - (5): 2 (EJ — R)’ 
; 4ab / =~ 4ab— “Sab 


completing the proof. 


< 
9 


5. The error bound for a finite set of channels. Lemma 3 is needed in the proof 
of Theorem 4. 

Lemma 3: Let (@, 8, P,(y|2z)) forye e@ = {1, 2,--- , L} be a finite class of 
channels and let Q(x) be a pdana on Q, determining P, (z, y), Jy(z, y)- 

(a) Define a channel (@, 8, P(y|x)) by Ply|x) = G/L) Ses Py(y | x) 
and let Q(x) determine P(x, y), J (x, y). Then for all a, 6 


> Py (J,Sa+6)+Le° 


L y= 


PJ sa)s 
(b) For any a > 0,G 2 1,6 > O there is a (G, €, 1) code for @ with 


L 
«= LGe* + Ve? + > Py J, S a + 8). 
1 


Proor: We first prove part (a). 


P(J a) =; DPA s a) SLIP, S a+) 
+ PJ, >a+6;J < a)| 
so that we need only prove that P,(A,) S Le’ where A, = (Jya+ 6;J S a). 


For any (2, y) € A, with P,(z, y) > 0 we have 


Ply) = Ply|2) = 5 Pxy|2) 2 7 eM P yy) 


\ 
SI 


so that P,(y) < Le °P(y). Summing this last inequality over all y such that 
there is an x with (x, y) ¢ A, we get P,(A,) < >> P,(y) < Le’ which completes 
the proof of part (a). 

Applying Theorem 2 to the channel P(y |x) defined in part (a) and then 
using part (a) to bound P(J S a) we find that there is a (G, «, 1) code for 
P(y\ 2x) with 


«= Ge" + PU Sa) SG" + 5D Py (J, a+6)+Le° 


Now P,(y |x) S LP(y| <x) so that if 2, is an input letter for the (G, « , 1) code 
and B; is its decoding set, then P,(B¢|2;) s L P,(Bi\2;) Ss Le. Thus the 
(G, « , 1) eode for P(y | x) isa (G, Le, 7 ) code for @ and the lemma is proved. 
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THEeoreM 4: Let (@, ®, P,(y| x)) fory e @ = {1, 2, «++ , L} be a finite class of 
channels. For any R > 0 such that 0 < C — R S 1/2 there is an (e*", €, , n) code 
with 

(c—R)? 
«= 2L7¢ eo 

Proor. Applying part (b) of Lemma 3 to the class of channels (@‘”, @'” 
Py(v\u)) with Q(u) = Q(2) --+ Q(a,) and Q(x) a distribution for which 
C = infyee EF, J, and G = e*",a = (R + 6/2)n, 6 = 0n/2 we see that there is an 
(e*", «,, n) code for @ with 


= (L+L’) ch + DP, (iu, sR+0). 


Let 6 = (C — R)’ and note that R + (C — RY SR+(C—R)/2SR4+ 
(E,J, — R)/2 = (E,J, + R)/2. Thus, 
(c—R)? 


L 
e. 3 (L+L’)e to" + 2 ry (: Jy S4(R+ EyJ,)). 


P, (Ly ,SMR+ EWS D)s P, (ty, < H(R' + B,Jn)) 


where R’ = E,J, — (C — R) 2 RandO S E,J, — R’ S 1/2. Therefore, we 
can apply the med obtained in the proof of Theorem 3 and get 


(ByJ4—R')? _(c—R)? 


P, (: Jy SR + E,Jy)) Ss e ib “= ¢ tb 
nm 


Now L = 2 so that 2L + L’? = L(L + 2) Ss 2L’ and since Theorem 4 reduces 
to Theorem 3 for L = 1, the proof is completed. 


6. The direct half of Theorem 1. Lemmas 4 and 5 are needed for the proof of 
part (a) of Theorem 1. 

Lemma 4: Let @, @ be given. For every integer M = 2b’ there is a class of channels 
(@, ®, Pj(y|x)) withe Du , where Dy has at most (M + 1 )” elements, such that 
for any channel (@, ®, P(y | x)) there is a channel (@, @, P’(y|x)) in Dy such 
that: 

(a) | P( |r) — P'(y|z)|s b/M for all x, y. 

(b) P(y | | x) < Ge P'(y | x) forall x, y. 

(c) For any distribution Q(x) on @ let P(x, y) = P(y|2)Q(x), P(x, y) = 


P’(y | x)Q(2), then 
b 1/2 
J — KE’ J'| s: —j} , 
| EJ Bs’| 5 (7) 


Proor. Let Dy be the class of channels (@, ®, P(y | z)) such that for all zx, y 
we have MP(y |x) = an integer. Clearly Dy has at most (M + 1)” elements. 
Given the distributions P(y | x) we will first construct P’(y | x) and prove (a), 
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(b). For this purpose it is enough to carry out the construction for one 2». 
Arrange the ‘“‘b’’ numbers P(y | x») in ascending order and designate them by 
mi Spe S +: Sm. Forti = 1,---,(b — 1) select D: uniquely by p; S Di < 
pit 1/M, Mp. = an integer. D: will be P’(y | 2) with the y being the one corre- 
sponding to p, . Clearly 


27 , b 
pse“p; and |p—p| s iV 


fori = 1,---, (6 — 1). It remains to show that if Ps =1-— >% Dn; then 
, , , ° : , 
Pp» 2 O and pp , po satisfy the same relations. Now 


b—1 
1 ‘ee fe ots oe oe 
> ==» > —-— > a - aa «mp. 28 o> 
rel y(n + p)em W*i° W*i 3” 3B’ 


Thus Pi ees. L Ps form a distribution and p, = Ds = b/M so that 


V 
> 
| 


| a Ps ls b/ iM. 
Also 


+ p+ thls / 1 4 20 < iy! 
<p u<? M =” yy?" *™ 


completing the proof of parts (a) and (b). 
In the proof of part (c) we wil! use part (a) and Lemma 1. In order to use 
Lemma | we obs:rve that b/M s 1/2b s 1/4 < 1/e. We also note that 


P(y) - P’(y)|s a |P(y|2) — P’(y|x)| Q(x) s b/M. 
Now 
| EJ — E'/'| s|[—X Ply) log P(y)] — [-X P’(y) log P’(y)] | 
u o 


+ | [-2 P(x, y) log P(x, y)| — [- 2 P(x, y) log P’(2, yl < u(? ) 
+ >} Q(z) | [—>> P(y| 2) log P(y4 x)| - [-d P’(y| x) log P’(y| x) )| | 


b\” b 
< — 
s6(i) +°(q) 
and the lemma is proved. 


Lemma 5: Let (@, ®, P’(y | x)), (@, ®, P(y | x)) be two channels and A a non- 
negative number such that P(y|x) S$ e“P’(y|x) for all x, y. Any (e*", €,, n) 
code for (@, ®, P’(y | x)) is an (e*", e,e4", n) code for (@, ®, P(y|x)). 

Proor: Let u = (11,°°+,2,) €@™,0 = (ti, -** 5 Yn) €@™. Then 


|u) = II P(y:|ai) = ec" [] P’(yi| xi) = e4"P’(v | u). 
1 


(n) 


we have 


P(D\u) s e*" P'(D|\u). 


Thus for any subset D of @°” and any ue@ 
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Let u;e@" be an input message and B; the corresponding decoding set of an 
(e*", «,, n) code for (@, ®, P’(y| z)). Then 

P(Bi | us) s e*” P’(Bi| us) S e*” «, 
and the proof is completed. 

We turn now to the proof of part (a) of Theorem 1. For each P(y| x) ¢ 
select a P’(y | x) € Dy according to Lemma 4 and let @’ denote this set of chan- 
nels. Let C’ = C(@’). Since @’ has at most (M + 1)” elements we know from 
Theorem 4 that if R’ > 0,0 s C’ — R’ S 1/2 then there is an * gd “8 ,n) 
code for @’ with 

(c’—R')2 


e, = 2M+1)™e te”. 
For each P(y | x) € @ there is a P’(y | x) € °° such that 


2b- py 


so that from Lemma 5 the code which we have for @’ is an (e*", €, , n) code for 


€ with 
vw ¢\2 912 
= ? 2ab _ (Cc — RY — 2b 
é,. = 2(M + 1)” exp { iéab wi” 

Let C = C(@C) and let Q(x) be a maximizing distribution for C. We wish to show 
that C’ cannot be very much smaller than C. For every P’(y | x) ¢ @’ there is a 
P(y|a) ¢@ such that EJ < E’J’ + 2b(b/M)'” where we use Q(x) in both 

cases. Thus for every P’(y| x) €@ 
° 1/2 
C = inf EJ < E's’ + 2 (+) 
e M 


b\"2 b\"2 
C s inf E’J’ + 2b (t) s C’ + 2b (t) : 
e’ 


so that 


M M 


Let R > 0 be given such that 0 < C — R S 1/2. We must show how io select 
R’ and M to get our result into the final form. 
We select an integer M such that 
2*ab* 2°ab* 


ee 6 ee 
— Ry =sM and (M+1)s (C— RB 


i ae” 7 


1/2 a2 2 
2» (1) <C-R R and Se he 


M 


= 2 


We define R’ by 


Cc’ — R' ) a 
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Clearly C’ — R’ < 1/2 so that we have an (e* *, ¢, , n) code for @ with 


(C—R)’ _ (C—R)’*\ 
4(16ab) Zab { 


2°ab we {< — R)’ 
socom) Se 
2 


The inequality C < C’ + 2b(b/M)"” shows that R’ = R and an ar én , 7) 
code for @ can easily be reduced to an (e”", €, , n) code for @ so that part (a) 
of Theorem 1 is proved. 


én & 2(M + 1)” exp -{ 


7. Converse half of Theorem 1. The proof is based on Lemma 6. 
Lemma 6: Let G be an integer, @ a finite set and let u,,--- , Ug be distinct ele- 
ments of a”. Define Q(x) on @ by 


B 
Q(x) = d (the number of times that x appears in u;). 


1 

nG 3 
Then any (G, €, n) code, for a channel (@, ®, P(y | x)) which uses these u, , +--+ , Ue 
for inputs must satisfy 


(1 — e)log G — log 2 S nEJ 


where Q(x) is used to define P(x, y) and J(x, y). 

Proor: Define a distribution »(u) on@™ by v(u) = 1/G if wis one of m, ---, 
ug and »v(u) = 0 otherwise. Define a distribution P(u, v) on @” x @&™” by 
P(u, v) = P(v|u)v(u) where P(v|u) is obtained from the n-extension of 
(@, 8, P(y | x)). Now define n distributions on @ K @ by 

P® (a, y) = P(y| x)» (2) 
fori = 1, ---, nm where 
v(x) = a v(a1, °** , Lin, Ly Liga, °° * 5 Ln) 


Byer Pe VTiF a en 


and observe that Q(z) = (1/n) >>%, v(x). Thus, the lemma will be proved 
if the following chain of inequalities is proved. 


(1 — ¢) log G — log2 s <2 P(u, v) log pee 
> Pp _P™(a,y)_ < P(x, y) 
Ss 2d a P'" (2, y) log PO(z)PO(y) sn a P(2, y) log PUP)’ 


Using log z = (log 2) logsr to convert a result from Feinstein [8], pp. 29, 39, 44; 
which is due to Fano [5]; we obtain the first inequality. The second inequality 
follows from page 30 of Feinstein [8]. We proceed to prove the third inequality. 
Now 


= > P (2, y) flog Piy| x) — log P“ (y)] 


NN iwl zy 


=> P(x, 4 y) log P(y|x) sae ¥ X P®(y) log P(y) 


zy 





CAPACITY OF CHANNELS 


but 


-* = Dd P(y) log P(y) s 


t—l y 


5 > P®(y)) log (: - p(y) 


T j=l 


-% 
= —> Ply) log P(y) = —X P(z, y) log P(y) 


= 


where this last inequality follows from Lemma 4 on page 16 of Feinstein [8]. 


Combining the above, we complete the proof of the third inequality and hence 
of the lemma. 


From Lemma 6 we immediately obtain that if G is an integer then for any 
(G, «, n) code for a class @ of channels there is a Q(z) on @ such that 


(1 — e)log G — log 2 Ss ninf E,J, s nC. 


vee 


+ R e 
Now e°” may not be an integer but 


log [e*"] = log (e*" — 1) = mR + log(1 — e*") = nR — log 2 
so that 
(1 — e) (nR — log 2) S nC + log 2 


which completes the proof of part (b) of Theorem 1. 
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INFINITE CODES FOR MEMORYLESS CHANNELS 


By Davin BuacKweEL.u! 
University of California, Berkeley 


1, Introduction and summary. For a memoryless channel with finite input 
alphabet A, finite output alphabet B, and probability law p(b | a), the capacity 
C is defined as the maximum over all probability distributions g on A of 


2, q(a)p(b | a)loge(p(o | a)/d q(a)p(b | a)). 


Shannon [1] has obtained the following result. 

Exponential error bound. For any Cy < C there is a number p < 1 such that, 
for every positive integer N, there is a set S C A” with at least 2°°” elements and 
a function g from B™ to S, such that, for every s = (a, ---, aw) € S, 


DX p(b: | a1) -++p(bw | aw) < 2%, 


where the sum extends over all sequences b; ,---, bw for which g(b; ,---, bw) ¥ s. 

Thus if the sender selects any s ¢ S and places its letters a ,---, ay succes- 
sively into the channel, and the receiver, on obcerving the resulting output se- 
quence b, ,---, by, decides that the input was g(b, ,---, bw), the probability 
that he makes an error is less than 2p", no matter what s ¢ S was chosen. This 
result may be described as follows: it is possible to transmit at any rate Cy < C, 
with arbitrarily small probability of error, by using block codes of sufficient 
length. 

We wish to draw a slightly stronger conclusion, as follows. We imagine an 
infinite sequence t = (2, 2%2,°--) of 0’s and 1’s, which we are required to 
transmit across the channel. At time N, the sender will have observed the first 
[CoN] coordinates of x, and will place the Nth input symbol in the channel. 
The receiver, having at this point observed the first N channel outputs, will 
estimate the first M(N) coordinates of xz. If M(N)/CoN — 1 as N — = and if, 
for every x, all but a finite number of his estimates are correct (i.e., agree with 
x in every coordinate estimated) with probability 1, we shall say that the chan- 
nel is being used at rate Cy. Our result is that, in this sense, a (memoryless) 
channel can be used at any rate Cy < C. 

The result stated below is exactly this result, for the special case Cy = 1. 
The general case involves no new ideas, but only more notation, and we shall 
restrict attention to the case Cy = 1. The function f, of a code, as defined below, 
specifies the nth channel input symbol, as a function of the first n coordinates 
of xz. The number M(n) is the number of zx coordinates to be estimated by the 
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receiver after observing the first n output symbols, and the function g, specifies 
the estimate. 


We now state the result precisely. 

For any finite set S, we denote by S” the set of all sequences (8; ,---, sw), 
where s, ¢ S for n = 1, 2,---, N. For a memoryless channel with finite input 
alphabet A, finite output alphabet B, an infinite code (for transmitting at rate 
1) is defined as consisting of (a) a sequence {f,} of functions, where f, maps 
I” into A, and I consists of the two elements 0 and 1, (b) a nondecreasing 
sequence {M(n)} of positive integers such that M(n)/n — 1 as n — ~, and 
(c) a sequence {g,} of functions, where {g,} maps B” into J“. 

An infinite sequence x = (2, 2 ,°-+) of 0’s and 1’s, together with an infinite 
code, defines a sequence of independent output variables y; , yz ,---, with 


Priyn = b} = p(b| falar ,-++, tn)), 
where p(b| a) is the probability that the output symbol of the channel is b, 
given that the corresponding input symbol is a, and defines a sequence of esti- 


mated messages t, , 4 ,---, where t, = gna(t,°*:, Yn). We shall say that the 
code is effective at x if, with probability 1, 


tn = (41, +++, Duin) 
for all sufficiently large n, and shall say that the code is effective if it is effective 
for every x. The result of this note is the 


THEOREM: For any memoryless channel with capacity C > 1, there is an effective 
code. 


2. Proof of the theorem. Choose a number D with 1 < D < C, and let p be 
the number <1 which Shannon’s exponential error bound associates with trans- 
mitting at rate D. Thus we can, for any positive integer R, transmit any [DR] 
z-coordinates with R uses of the channel, with error probability at most 2p”. 
We shall divide the z-sequence into successive blocks, of length R(1), R(2),---, 
where {R(k)} is an appropriately chosen increasing sequence of positive integers. 
We may use the channel, during the time the k + Ist block of z-symbols is 
observed, to transmit up to [DR(k + 1)] z-coordinates, among those received 
to date, with error probability at most 2p*“*”. We choose to transmit the kth 
block, containing R(k) z-coordinates, and to repeat the first Q(k) coordinates 


of x, where {Q(k)} is a nondecreasing sequence of nonnegative integers such 
that 


Q(k) + R(k) s [DR(k + 1)], 

Q(k) Ss R(1) + --- + R(k — 1). 
Since {R(k)} is strictly increasing, >>, p*“” converges, so that, with probability 
1, only a finite number of errors will be committed. That is to say, the receiver, 
after observing the k + Ist block of output symbols, estimates the first Q(k) 
z-symbols, say as u(k), and the kth block of z-symbols, say as v(k), and we 
have, with probability 1, 


u(k) =c(k), v(k) = d(k) 
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for all sufficiently large k, where c(k) denotes the first @(k) coordinates of x and 
d(k) denotes the kth block of z-coordinates. After observing the k + Ist block 
of output symbols and making the estimates u(k), v(k), the receiver will have 
estimated each of the first R(1) + --- + R(k) = T(k) coordinates of z at 
least once. He now forms an estimate w(k) of the first T(k) coordinates, using 
the latest estimate made on each coordinate. If 


Qik) = RIL) + --- + R(—1) +h,O Sh < R(t), 


the estimate w(*) is: 
w(k) = (u(k), v*(t), o(¢ + 1), --- , o(k)), 


where v*(7) consists of the last R(t) — h coordinates of v(7). If Q(k) > x 
with k, so does 7. Since, with probability 1, all u(7), v(¢) for 7 sufficiently large 
are correct, we conclude that, with probability 1, 


w(k) = (%,°*+,2re) ’ 


for all sufficiently large k. We have thus defined a sequence {w(k)} of estimates, 
where w(k) estimates the first T(k) coordinates of x after T(k + 1) outputs 
have been received, such ti.at, with probability 1, all but a finite number of 
w(k) are correct. 

For n < T(2), we define g, arbitrarily; for T(k + 1) Ss n < T(k + 2), we 
define g, as w(k). Thus, for T(k + 1) S n < T(k + 2), we have M(n) = 
T(k), and M(n)/n—> lasn— ~& if T(k)/T(k + 2) ~lask— a. 

In summary, any two sequences {R(k)}, {Q(k)} can be used to define an ef- 
fective code, if 

(1) {R(k)} is a strictly increasing sequence of positive integers. 

(2) {Q(k)} is a nondecreasing sequence of nonnegative integers. 

(3) Q(k) + R(k) s [DR(k + 1)). 

(4) Qk) S R(1) +--+ + R(k— 1). 

(5) Q(k) ->7 ~ ask— ~, 

(6) (R(1) + --- + ROk))/( RO) + --- + R(R+2)) oO lask— . 

The sequences R(k) = k, Q(k) = [min(1, D — 1)(k — 1)], for instance, 
satisfy (1) --- (6). 

This completes the proof. 

It would be desirable to extend the theorem to finite-state channels. The method 
of this paper relies on Shannon’s exponential error bounds, and such bounds 
are not yet known for general finite-state channels. 
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NOTES 


A PROOF OF WALD’S THEOREM ON CUMULATIVE SUMS 


By N. L. JoHnson 
University College London 


1. Introduction. In the theory of sequential analysis developed by Wald [1], 
there occurs a theorem, one form of which can be expressed as follows: 


THEOREM 1. If 


(i) 21, 22, 23 ++ are independent random variables with common expected value 
&(z) = pw, 


(ii) &(|2;|) S A < @ for alli, and some finite A, 


(iii) n is a random variable taking values 1, 2, 3, --- with probabilities P, , Po, 
P; --+ respectively, and 
(iv) the event {n = i} depends only on % , 22, °** Zi-, 
then, setting Z, = Dim Zi, 
&(Z,) = w&(n). 


This note presents a simple proof of this theorem. It appears to be an ab- 
breviated form of an argument due to Wolfowitz [2}. 

In Sections 3 and 4 of this note an extension of the method to the evaluation 
of the variance of n is discussed. 


2. Proof of Theorem 1. Let y; = 1 if z,; is observed (i.e. if the event {n 2 7} 
occurs) and y; = 0 if {n = 7} is not observed, so that 


Pr {ys = 1) = Prinz a = DEV P;. 


Then Z, = >-Sa yz and &(Z,) = &( 2 yz:) = DR &(yz:) since 
> wea | &(yz:) | < A&(n) < @. By reason of (iv), 


&(yzi) = &(y:)&(z:), 


80 


&(Z_) = ¥ 8(y.)8(2.) es > &(y.), 


= wD (Pi + Piss + °°*) =u 2 iP, = p&(n). 
3. An analogous second moment theorem. 
TuoreM 2. If we assume, in addition to (i)-(iv), that 
(v) var (z:) = &(z3) — » = 0° < ©, with the same value for all i, 
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(vi) &{(z; — w)*?|n = i] S B < @ forallj < i, with B independent of i, 
(vii) &(n?) = 924 7P; < ©, then, setting Z, = Z, — nu 
&(Z,;) = 0° &(n). 


Proor. Let z; = 2 — Lt, SO &(z,°) = o*. Then Z, = Zit z; and &(Z.2) = 
« 2 , , : 
8 (dot DoF yziys2;). Since 


°© ea 2 o i—l 
2, 2, 8 lyusziys2;|) = De Sly) + 2D De 8(yi | 2625 | ) 
iw] jo tes t=] j= 


o i— 


1 
o &(n) +2 >> > &(ys8( | 242; || n= #)) 


1 j=l 


o it—l 


o&(n) +2 ox 2, &(y:)[6(2:*)8(z;7 | n = i)}* 


o t—1 


o°&(n) 20B' > > &(y:) 
1 


t=] j= 


IIA 


lA 


o&(n) + 2cB z 


‘ 
t=1 


i(i — 1)P,; 


bo! 


lA 


o°&(n) + oB' [&(n’) — &(n)] 


< @o, 
we can invert the order of summation and expectation, giving 
42 = = , , 
&(Z,) = one a E( ys 21 ys 25) 
t=] ja 


ao t—l 


= Veys2) +20 ¥ &lys 2123) 
t=] t 


i=] j= 


o i—l 


= ¥ alyelet) +2 FF e(yes)ele’) 


t_—] j= 
= o&(n) , 
since &(z;) = 0, and by reason of (iv). 


4. The variance of xv. If, now, we make the assumption (viii) &(Z, |) is 
independent of n, we have, using Theorem 2, 


o’&(n) = &(Z72) 


&[(Zn — nu)’ 


&(Z,) — 26(n)y8(Zn) + &(n’)? 
&(Z,) — 2uE(n)P + &(n*)y 


(using Theorem 1). 
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Hence 


&(n*) = [o°&(n) — &(Z,)|u + 28(n)} 


or 


var (n) = [o°&(n) — &(Z,)|u + [8(n)P 
= [o’&(n) — var (Z,)|u~ 


5. Concluding remarks. Theorem 2 has been stated in [3] with the weaker 
condition 


3 
(vi)’ &(n*) < ~ 


in place of conditions (vi) and (vii), but an error in the proof was pointed out 
in [4]. 


Conditions (vi) and (vii) may be replaced by either 
(vi)” g(n’*) < @ (6 > 0) 


or 


(yay"" taiVP; < @. 


Condition (vi) is certainly satisfied if the event {n 2 7} is equivalent to 
a < Z; < bfor allj <7. For then we must have | z; — u| <b — a+ |u| and 
so &[(z; — »)’|n = i] < (6b — a + |u|)”. This condition is therefore satisfied 
in standard sequential procedures, which have continuation regions of form 
a < Z; < b. Condition (vii) is also satisfied by such procedures when (v) is 
satisfied (see [5}). 

I would like to thank the referee and Professor W. Hoeffding for help in the 
presentation of this paper; in particular, they suggested the alternative condi- 
tions (vi)”’ and (vi)’”’. 
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A NOTE ON MULTIPLE INDEPENDENCE UNDER MULTI- 
VARIATE NORMAL LINEAR MODELS 


By V. P. BHAPKAR 
University of North Carolina 


1. Introduction. 8S. N. Roy and Bargmann [3] used 8. N. Roy’s union-inter- 
section method as the basis for providing tests and confidence intervals in the 
following cases: 


i) y’ = (",°"', Yr) ~ Niv', 2), Ho:oi; = 0,7 # j. 
ii) y’ ~ N(y’, =), but y’ is partitioned into k sets or blocks or sizes p, , --- , 
pr. Ho: Zi; = 0,7 # j, where 2,; is the covariance matrix between blocks 
i and j. 
J. Roy [1] considered the following additional cases: 
ili) Yin X p, (yy, °°* , Yes) ~ N(—, 2), 7 = 1,---, 2, EY = AO. 
A:n X m has rank r S n — p and is known, @ is unknown. Let @ = Bé 
be estimable, B:t X m. H,:@ = 0. 
iv) (1, *°* > Yp) ~ N(w’, 2). Ho: = = YX (specified). 
Vv) (hy *s Yr) —_ Niv’, 21); (x1 uw Lp) — N(v, 22), H,:2, = 22. 
In this note we shall consider the following modification of (iii): 


vi) Yin X p, (yay, °°*, Yos) ~ N(—, 2), 7 = 1,-°-, n, EY = AO (as in 
(iii)). Hoto;; = 0,7 4 7. 


2. Step-down procedure to test H, in (vi). In the notation of [1], denote the 
ith columns of the matrices Y and 6 by y; and 6; respectively and write 


Y;=([y,--:,yil, 6; = [0 ,--- , i) 
and 2; = (oj), j,k - 1, Pua » t. 


If Y; is fixed, the n elements of y;4; are distributed independently and normally 
with the same variance o;4, and expectations given by 


(1) E(yini| Y:) = Aninn + YG, 
where G::1 X 7 is the row vector, 
(2) 8; = (ois ts one )B, 


and nj,::m X 1 is the column vector given by 
(3) Ninn = O41 — 6:6;, $= h,+- ,p — 1. 


We note that H, is true if and only if the hypothesis H;:6; = 0 holds for all 
i= 1,---,p— 1. Now the elements of the vectors 8; and n;,; may be regarded 
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as unknown parameters, and hence, when Y; is fixed, the hypothesis H;:8; = 0 
is a linear hypothesis in univariate analysis with the linear model given by (1). 

We observe that rank Y; = 7, a.e. and rank (A:Y;) = r + i, a.e. Hence 
6; is estimable and the hypothesis H; is testable. Let 3; be the Gauss-Markov 
estimator of §; in the conditional set-up. Denote the variance-covariance matrix 
of 8; by oi4,C; where C;:¢ X 7 is a positive-definite matrix. Let si/n — r — i 
denote the usual error mean square giving an unbiased estimator of o74, . Then, 
as in [1], 


_ (6; — 6)/C7"(6; — B)/i 5 Hae of Sy 
(4) F, = = si/in—-r—-i) ” =1, »?P 1, 
has the F distribution with i and n — r — i degrees of freedom. 

Thus the conditional distribution of F;, given Y;, does not involve Y; and 
hence does not involve F;, --- , Fi. Therefore, the statistics F;,--- , Fy 
have independent F distributions with degrees of freedom i and n — r — i, 
7 = 1,---, p — 1 respectively. 

For a preassigned constant a;, 0 < a; < 1, let f; denote the upper 100 a; 
percent point of the F distribution with 7 and n — r — i degrees of freedom. 
Then the probability that simultaneously 


(5) FP: 345. t=1,---,p-1, 
is equal to [[?=' (1 — a). 


Since H, — H;:8; = 0,% = 1,---, p — 1, we utilize (4) and propose the 
following test procedure for H,: 
Accept H, , if 
“A v-14 ys 
(6) u= - 8: Ci O/t_< fi 


s?/n r .” forall i =1,---,p—1; 
O4/ = oe 


otherwise reject H,. 

To carry out the test one should first compute m . If u > f, , H, is rejected. 
If m < fi , wis computed. If uw, > fe, H. is rejected. If uw. S fe , us is computed 
and so on. The level of significance for this test is obviously 1 — []?= (1 — ai). 
One possibility is to choose a; = --- = a,1. We prefer choosing a’s so that 
fi = +++ = fp, for reasons discussed in [3] 


3. Confidence bounds associated with the test. 

Now from (4), F; $ fi=> (6; _ 6:)’ (6. — Bi) S Amax (C,)list where [ = 
if,/(n — r — <) and Amex (C;) is the maximum characteristic root of C; . Hence, 
in view of (5), with a probability greater than Tz 1 — a), 


o) (6: — 8:)"(6i — Bi) S Amex (Cis , 6 Rg l, gerg 
Now (7) implies 


(8) a.3, —_ 18 Miva (C;) = a.8, S a:3, + lise. (C;) 
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for all non-null a; X 1 such that aia; = 1,(@ = 1,---,p — 1). This again 
implies 
(G:6.)"? — lisAnex (Cs) S (8:8,)"” 
(9) ale /2 
S (6:6:)'" + lisdmex (Ci), i= 1,---,p— 1. 


Thus (9) holds with probability greater than [[?=' (1 — a;). We may ob- 
tain partial statements by choosing some elements of a; in (8) to be zero. Thus 
we have the simultaneous confidence bounds given by (9) for all possible sub- 
sets of §; for alli = 1, --- , p — 1 with the confidence coefficient greater than 
IPs (i — ai). 

4. Remarks. 


(a) It will be easily seen that when Y represents a random sample of size n 
from N(w, 2), (1) takes the form 


E(yine| Yi) = wins + 2, Biya — 5), 
j= 
where y; = (ya, sos » Yin) and 6; = — (Ba , et » Bis), i= 1, “em 1. 


If we write 8;; = Domi (ya — 9) (ya — 9;) and S; = (sn), ik =I, 
then it is well-known that 


85411 


y—1 ° pod 
6; = S; : =b,, C, = Sj 
$541.4 
and 
i ‘ | ‘ . ' 
8i = 8i41,i41 — (Siti ore es 8i41,) i (Sita. ee 5 841.4) ’ 
so that 
ye 2 
a b,S b,/i Ti4d. ‘ n—-l1l—z2 
6 a Se 
si/jn—-l—+t Ll — roan. 1 


where rj4:.1,...., denotes the multiple correlation coefficient of (¢ + 1) with 
(1,---, ¢), thus giving as a special case the test procedure already obtained 
in [3]. This is, of course, as it should be. 

(b) In this model, as in (iii), it is of interest to investigate whether the test 
of the usual multivariate linear hypothesis of the type 


(10) Ho: = Be = 0, 


where ® is estimable, and the above test of independence are quasi-independent 


(see e.g. Roy [2]). As shown in [1], the step-down test procedure for (10) gives, 
when J, is fixed, 


(11) Pr, me (iss — ins) Ditu(Gees — dss)/t 


=G@1,--- ¢<1 
3/n—r-1 0, P 
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where ®;,, = Bn; ; and the variance-covariance matrix of bia. is Dix:0741 . 

F; given by (4) and F; given by (11) are, for fixed Y; , quasi-independent 
if the numerators, which are marginally distributed as xjo74,/i and xio74,/t re- 
spectively, are independent. It can be easily verified that xj and x? are not 
independent and hence the tests for H, and H, are not quasi-independent. It 
may be noted that, when Y; is fixed, the test of 8; = 0 is like testing significance 
of regression, as seen from (1), while the test of ®;,, = 0 is like covariance- 
analysis. 


(c) We may consider extension of (vi) to blocks, as in (ii), and test 
He: 23; = 0, 


as pointed out by the referee. It is easy to check that a similar step-down pro- 
cedure with respect to blocks will result in k — 1 independent tests in multi- 
variate analysis of variance of the same general structure as in [1] and [3}. 


5. Acknowledgement. I am indebted to Professor 8S. N. Roy for suggesting 


this problem and to the referee for suggesting improvements in structure and 
exposition. 
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NON-MARKOVIAN PROCESSES WITH THE SEMIGROUP PROPERTY 


By Wi.uiAM FELuer! 


Princeton University 


1. Introduction. Every N X N stochastic matrix P defines the transition 
probabilities of a Markovian process with positive discrete time parameter. Its 
n-step transition probabilities satisfy the Chapman-Kolmogorov, or semigroup, 
relation P**™" = P"P”™. We shall show that for N = 3 there exist non-Markovian 
processes with N states whose transition probabilities satisfy the same equation.’ All 
elements of P will equal N~'. The process may be chosen strictly stationary. A 
simple modification leads to non-Markovian processes with continuous time 
parameter and the semigroup property with N states or a continuum of states. 

The triviality of the following example should not obscure the interest of the 
problem concerning the existence of non-Markovian processes satisfying the 
Chapman-Kolmogorov equation. As so many other basic problems in probability 
theory, it has been formulated by P. Lévy who with his usual ingenuity gave the 
first counter-example to the obvious conjecture. 


2. Let B% be the sample space whose points (2, --- , 2”) are the random 


permutations of (1, 2, --- , N) each carrying probability 1/N! Let ® be the set 


of the N points (2, ---, 2) such that 2 = » for all 1 < i S N where p 


is a fixed integer 1 S v S N; each point of ® carries probability 1/N. Finally, 
Let S be the mixture of ® and R with B carrying weight 1 — N and R 
weight N™. 

More formally, S contains the N! + N arrangements (2, oe, --- , oh) 
which represent either a premutation of (1, 2, --- , N) or the N-fold repetition 
of an integer v, 1 S »v S N. To each point of the first class we attribute proba- 
bility (1 — N~')(N!)~', to each point of the second class probability N~’. 

Then clearly 


(1) Piz =y=N', Piz =a», 2 =p =N* 
for all ¢ # j. Thus all transition probabilities are equal: 


(2) P{2 = v|a” = pn} = iil 


) ) (3) 


Given, say, that « = 1, 2° = 1 the probability that x 
the process is non-Markovian. 


~ 1 is zero, and hence 


3. To extend the process to all integral values of the time parameter consider, 
in the usual manner, a double infinity of independent repetitions of the described 


Received February 4, 1959. 
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2 [Added in proof.] D. Blackwell has pointed out to me that the variables of our process 
represent a sequence of random variables which are pairwise independent without being 
mutually independent. 


1252 








NON-MARKOVIAN PROCESSES 1253 


sample space. In other words, we consider the product space ---S X SX © :-:- 
with product measure; its points are the doubly infinite sequences x = {2°} 
such that for each integer r the N-dimensional block (2°**”, 2“**”, --- ,2°°*?") 
represents the projection oi x onto a coordinate space S. This represents a non- 
Markovian process with the stationary transition probabilities (2). However, 
the process itself is not stationary in view of the periodicity modulo N. 

To obtain a stationary process of the same type it suffices to introduce N 
replicas of our process with time shifts 0, 1, 2, --- , N — 1 and define a new 
process as their mixture with equal weights. 


4. To construct a process of a similar character defined for all t = 0 consider 
the above discrete process and a Poisson process {N(t)} with mean ¢ independent 
of it. Define a new process by 


(3) a(t) = 2, 


Its absolute probabilities are given by 
(4) P{x(t) = vy} = D2 P{IN(t) = §-P{z™ = y} = N™. 
t=0 


The joint probabilities for 0 < s < ¢ are calculated in like manner, but the pos- 
sibility that N(t) = N(s) makes it necessary to consider separately the cases 
vy = wand v ~ u. Clearly 


P,,(t) 


ll 


ta(s +t) = w|a2(s) = v} 
(5) =N'(1-—e'), if v ~ pw, 
P,(t) = &' +N (1 — &) 


and a simple calculation shows that 
N 

(6) P,(s +t) = on P,»(s) Pr (t) 
=] 


even though the process is clearly non-Markovian. 
Finally, one might replace the N states by N intervals and an appropriate 


motion within them. 
REFERENCE 


jl] P. Lévy, “Exemples de processus pseudo-Markoviens,’’ Comptes Rendus Académie 
Sciences, Paris, Vol. 228, (i949), pp. 2004-2006. 





SUCCESSIVE RECURRENCE TIMES IN A STATIONARY PROCESS 
By Suu-Ten Coen Moy 


Wayne State University 


Let X_, X:, X2,--+- be a stationary sequence of random variables. Let B 
be any linear Borel set for which P(Xo ¢ B) > 0. We are concerned with the 
successive recurrence times » , v2, --- of B; their time averages and their ex- 
pectations. Without loss of generality, we shall assume the basic probability 
space @ to be the collection of all sequences w = {--+ , 271, %,%,-°::*} and 
X,, to be the coordinate variables, i.e., X,(w) = x, . Let T be the shift trans- 
formation. The nth coordinate of Tw is the (n + 1)th coordinate of w. Then 7 
is 1-1 and preserves the probability measure P. For any 


7) = toss Bs, MMi, ** th, 
if there are infinitely many positive integers n with z, e B, let 
AMT Ms, ATA? FM, 


be the successive positive integers for which z,,,..._,, ¢ B. If there are finitely 
many, say K, positive integers n with x, ¢ B, define », --- , vx as before but 
define vei: = ve42 = ++: = ©. In this paper, Theorem 1 is concerned with the 
time average of the successive recurrence times, the v’s. In Theorem 2 the succes- 
sive recurrence times are proved to be stationary given X») ¢ B. Theorem 3 may 
be considered as a generalization of a theorem of M. Kac in which he proved the 
formula (7) for the first recurrence time » ({2], pp. 1006). 
THEOREM |: For almost all w 


(1) lim vi(w) + ++ + m(w) 


kon k 


exists. The limit may be finite or infinite. It is finite for almost all w ¢ E where 
E = [Xo eB). In particular, if T 1s ergodic, the limit is equal to 1/P(E) with 
probability one. 

Proor: Let J, be the indicator function of EZ, ie., Ig(w) = 1 if w e¢ EF and 
Ig(w) = Oif w zg EL. By the ergodic theorem, for almost all w 


(2) lim Iz( Tw) fp tee Iz(T"w) - 


koro n 


f(w), 


where f(w) > 0 for almost all w ¢ LF. If T is ergodic f(w) = P(E). 

In fact, [[2(Tw) + --- + Ix(T"w)|n' is the relative frequency of occurrence 
of B. If the limit of the relative frequency, as n — ©, is positive, B must occur 
infinitely often; therefore, »(w), v2(w),-:- are all finite. Thus all successive 
recurrence times are finite for almost all w ¢ #. In particular, if 7 is ergodic, they 
are all finite for almost all w ¢ Q. 
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Let 2’ be the collection of all w for which »(w), ve(w), --- are all finite. Let 
w € ©’. For any positive integer k, let n, = »(w) + «++ + m(w). Then 














n(w) + sss + ve(w) a ea Me = 
k Ip(Tw) + +++ + In( Tw) 
Therefore, for almost all w ¢ 2’, there exists the limit 
_ wi(w) + +++ + mw) 1 
3) ~ as 
: 9 k fle) 

The limit is finite or infinite according as f(w) > 0 or f(w) = 0. If wz’ then 
there is a positive integer K for which vgii(w) = vey2(w) = +: = ©. Therefore 
lim "(@) + +*: + mo) _ 

kw k 


It is clear that Ig(Tw) + --- + Ig(T"w) Ss K for all n and that 
lim Tg(Tw) + «++ + Ig(T"w) - 


na n 
Therefore (3) again holds true. Hence (3) holds true with probability one. If T 
is ergodic, 1/f(w) = 1/p(E). 

Let Py, E = [Xo € BI, be the conditional probability measure given Xp» ¢ B, 
i.e., for any measurable set F, 


(4) P,(F) = P(ENF)/P(2£). 
Then », v2, -°** are finite valued with probability one under the probability 
measure Pz. 


THEOREM 2. The random variables v; , ve , «++ are stationary under the conditional 
probability measure Px , 7.e., 


0. 


(5) P3(n tad i, "oh 9) - Perms = hi, °* Vn+k = te) 
for any positive integers, m, k, and any k-tuple of positive integers, (i, , --- %). 
Proor: We shall proceed by induction on the integer m. Let Fi, ,---% = 
[v) = 4, °++ me = &], and let HE’ = Q — E. Then 
Pg\v2 = ths °*-* Ree = 1x] 
= DO Pal = 1, ve = th, +++ rear = te] 
n=l 


= PATE N--- NTP NT ENT 'P,,....«) 
n=l 


vs PE DS PEN TOE NATO OE NTENTOP,,...-id 
n=l 
1 


= Bit) DY PIPE N TOPE «A TE NEN Py, 
4) n=l 


1 “/ @ 
= —— Ee nN E U P i,k 
P(E) " L(Y, : ) | 
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The Poincaré recurrence theorem ([{1], pp. 10) asserts that 


P [(u rE) n E] = P(E). 


Hence 
s z 1 : 
Priv 7 Fig °°? 9 eu. i) ” P(E) PIE ni P i,.-+-, tg) = P,|F ;, et tx] 
= P,|r1 =u, = i,). 


Hence (5) is true for m = 1 and any k and any k-tuple (7 , --- 7). Now assume 
that (5) holds true for all m s M. 


2 
PE|v u+2 =, *** UMsitk = tx] _ Zz Pelvis = nN, ¥u+2 = ty p*** fae > a] 
n=l 
x 
_ > Pelr =, & = hy, oes Bee. > tx] = Pz {v2 - hh, vo? *—* Cees = tx} 
om) 
= P,([y = th, 73? i ax). 


Hence (5) is true for all m. 
THEOREM 3: Let f(w) be defined by (2), .e., f(w) ts the limit, asn — &, of the 
relative frequency of occurrence of B. Then, for any k, 


(6) [ mlo)Pe(de) = lie P,(d). 


The conditional expectation of the kth recurrence time given Xq € B is finite if and 
only if 1/f(w) is integrable with respect to Pg. In particular, if the shift trans- 
formation T is ergodic, then 


1 
P(E)” 
Proor. By Theorem 1, the set of all w such that 


Him “6@) + +: + lw) _ 1 


= — oo ae 

has P, measure 1. Since the process »; , v2 , -- - is stationary under Pg by Theorem 
2, the conditional expectations f »(w)Ps (dw) are the same for all k. If 
J %(w)Pe (dw) < ©, since {»} is stationary, (6) follows easily from the ergodic 
theorem. If f (w)Ps (dw) = ~, let 


‘n(w), ve(w) = N, 


(7) | [ mo) P,(dw) = 








vyi(w) = ; 
\N, otherwise. 
Then the process v” , »2 , --- is again stationary under Pg, and therefore the 
set of w for which limx..« (v\(w) + --- + ve(w)/k exists has Py measure 1. 


Let gv(w) be the limit. We have 
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[ ot(o)Px(de) = f gx(w)Pa(de). 
But gw(w) S 1/f(w), hence 


[ of w)Pr(de) = lie P,(dw). 


lim | ve(w)Pe(dw) = [ 6) P(de) = 0, 


| jp Pete) = ©, 


and again (7) is true. 
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ON THE MUTUAL INDEPENDENCE OF CERTAIN STATISTICS 


By C. G. Kuartri 
University of Baroda, India 


1. Summary and introduction. The results of this note yield the mutual in- 
dependence of certain matrices, characteristic roots, Hotelling’s T? or Mahala- 
nobis’ D? statistics, and C. R. Rao’s R statistic. The result concerning the mutual 
independence of certain of Hotelling’s T? statistics has been proved by K. S. Rao 
[1]. The results mentioned here occur (by implication and as a by-product) in 
[5], [6] and [7] in the course of investigations on some specific problems in statisti- 
cal inference but are not explicitly stated. These results can be utilised in statisti- 
cal inference and especially in simultaneous tests and simultaneous confidence 
interval estimation, and also in other problems. 


2. Certain known results. 
(2.1) Let X:p X n(n 2 p) be a matrix of p rows and n columns. (A column 
vector x:p X 1 is denoted as z:p X 1). Let X have a distribution f(X). Then 
XX’ is symmetric positive definite. 
(2.2) If S:p X p is symmetric positive definite, then S = T7’, where T 
is a triangular matrix with zero’s above the principal diagonal, ¢;; > 0, and 


tii = | A x | [MAG | * | Aj-n.i-t |, where 
$11 $1; 
eae eo s> as bm 
; §j-1,1 8j-1,i J ai 
8j1 sj 


(See (2].) 


(2.3) The roots of XX’ are the roots of X'X except for some zero roots. 
(See [5], (7].) 
(2.4) If X; and Y; are transformed toX ;,, and Y ;,; respectively, 7 = 1, --- k, 


by the following matrix transformations, 
X; = Fj( Xj, Y;) and Yj; = Gj(Xjui, Yin) (9 = 1,2,---k), 
then the Jacobian of the transformation X, to X;,4; and Y, to Yx4,; is 
J(X1, Yi 5 Xear, Yeu) = j= J(X;; X ju) I (Y; 5; Vj41). 


(See [2], (5), [7].) 

(2.5) The Jacobian of the transformation X = AYB(X:p X q, Y:p X 4@, 
A:p X p, B:q X q) is J(X; Y) = |A|*| B\”. (See [3].) 

(2.6) The Jacobian of the transformation S = GRG’ (S, R:p X p sym- 
metric matrices, G:p X p) is J(S;R) = | @\?*". (See [3].) 


Received December 16, 1957; revised March 9, 1959. 
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(2.7) The Jacobian of the transformation S = R™'(S, R:p X p symmetric 
matrices) is J(S;R) = | R|-‘?*”. (See [3].) 


ooo | @X me 2on—p(o—DI4 TT n—i+1\\~ g|°-)? as 
(2.8) wie a ar ee S, 


a ) 
Ss XX'<8+(ds;;) 
X:pXn(n2p) 


where dQ is the product of the differentials of the variables. (See [4], [5], [7].) 


3. Theorem I. Jf S:p X p and X:p X q are independently distributed as Wishart 
(n, p; 2; S) and MN(0, =) respectively (where MN(0, =) is called multivariate 
normal and has a density which is a multiple of exp [—(tr =-'XX’)/2]), then 
S: = S + XX’ and Z = T'X (where S = TT") are independent with distribu- 
tions given by Wishart (n + q, p; 2; S:) and the density e(I — ZZ’) ?-?” 
respectively. Here 


P  ¥ , ey \ <1 
e=J[r « +4q sth) ir (= ett) Plt 
t= 2 2 J 


Proor. Transform by the relations S = S, — XX’ and X = TZ, where 
S, = TT’; then by (2.4) and (2.5), the Jacobian of the transformation 
is | T |* = | S, |*’; also| S| = | S,| | I — 22’ |. 

Since S and X are independently distributed, we can easily see that the joint 
distribution of S,; and Z is 
(1) f(81, Z) = fil S:)f2(Z), 


where 
SiC S1) Wishart (nm + q, p; 2; S:), 
fo(Z) c | I ae ZZ' prraree. 
c being the same as defined in the Theorem. 
Corotuary 1. If S:p X p, Xitp XK qi, t = 1, 2, +++ m, are independently 
distributed as Wishart (n, p; 2; S) and MN,(0, =) respectively, then 
Sm = S+ >°%1 X;Xj and Z; = T7°X; (where 8; = 7;T; = S+ Yj X;X;') 


are mutually independent and distributed with the respective densities Wishart 
(n + €m, pp; 2; Sm) and 


(2) 


Pp 


g Pt? II 


j=l 


T (" + = jt+ ') {r « + tes jt+ yy ‘I Zs z! | omtes—1—r—D 2 


(3) ~~ = DLj-1 qi» 
The proof can be obtained in a similar manner as above. 
Coro.tuary 2. In Corollary 1, suppose p = qi. Then the distribution of 
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Vi = ZZ; = XUS + Dojar XjXj)"X; is given by 


(4) C V; oo re V; [Sotecnr-o-0re 


a= Il 


j=l 


p(eta-ittyf a—it3) (adnn—joet!) —— 
r( 2 ){r( 2 . 2 ” ’ 


Proor. The distribution of V; can at once be obtained by the use of the 
integral (2.8) in formula (3). 

Note that the distribution of W; = V.i(I — Vi)" = XS + > san xX,X;)*X. 
is obtained by using the Jacobian (2.6) for (I — V;) = (J + W;)™ and we 
find it to be 


(5) C | W; ee I1+W; peewee 
Similarly if p < q;, the distribution of Z,Z, and Z,Z,(I — Z,Z.)~ can be ob- 


tained from (3). 
Corotuary 3. If in Corollary 1, q; = 1 for all i, then 


= 
Ti =2:(S+ a2) x, i=1,2---m 
j=l 
and 


m 
¥ vy , 
Sn=S+ Daa, 


j=l 


are mutually independent with distributions given by 


const. (1%)? "(1 + ae 7 += 1,2,---m 


and Wishart (m + n, p; 2; Sm). 

Coroutiary 4. If S:p X p, Xi:tp X qi, t = 1, 2,--- m are independently 
distributed as Wishart (n, p; 2; S) and MN,(0, =), then the characteristic roots 
of ALS + > faa A;)~' are distributed independently of those of S» = S + > 5a A; 
(where A; = x2, a 

Proor. By Corollary 1, we have Z; and S,,, 7 = 1, 2, --- m, independently 
distributed and hence roots of Z:Z; and S, are independently distributed; i.e. 
by the application of (2.3) we have the corollary. 

TuEeoreM 2. Let S:p X p, x:p X 1 be independently distributed as Wishart 
(n, p; 2; 8S) and multivariate normal (4, =), 6 # 0, respectively, and let 


q CY x q ys qd 
: fi g. Su Sie > _ (2% 2x 
o° ? we ’ : ? on ? 
2) p—q Sx S22Jp—¢ Zn Z22/p—~ 
qa P 


=@ q Pp-@ 


q 
M1 2 a | 2 —1 
a= ( ) 9 Oo = m2n mandA,=y’=z 4. 


H2/ p—q 
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If AS = Ai then R = (1+ 118i tn)/A + 2S” ‘r) and S, = S + ag’ are in- 
dependently distributed and their densities are constant R‘’~?~??(1 — R)‘?-*?”? 


and non-central Wishart (n + 1, p, 1; 2, uw; Si) with 1 (1 és trivially the rank 
of 4) non-central parameter (for a discussion of the R statistic, see [8]). 

Proor. Let 25 = BB’. Make the transformations V) = BSB’, y = B's, 
Vi = Vo + yy’ and ai = w where V; = TT". By (2. 5) and (2.6) the Jacobian 
of the transformation is | B |?**| 7 | = | = \°?*”"| V, |”. Also 

1S] =|V; |. | Yl=/>\./Vi).d1 — w’y). 

Let 


we aero bs q q BR, : y e (2 y me (Se .\¢@ 
7 5 as Cy : = ei B; pe “= We) p«’ mn T. 1; a 
.) Dae @ p-4a 


Then we get 22) = BB, bid: = wiZinn/2 = A3/2 and §’§ = Aj,/2. Hence if 
Ai, = A’, we have 


(7) dodo = OiLe. & = 0. 


Hence we can write down the joint distribution of V; and w as 


RGcul « (I = ~— - y’ w)*?- aad Vi |’? oxy (—tr Vi + 28:71 wr — 44/2) 
rr rn 


von TT (A@—5t*) 


i=] - 


Apply the transformation z = w:/V(1 — wy). The Jacobian is 
/ /2 . . 
(1 — wiw,)'” ”””. Hence equation (8) is 


(9) S(Vi, wi, Ze) = fi(Vi, wi)fe( ze) 


where 
y , n—q—1)/2) yx |(n—p)/2 ; , ‘ Ty 2 1a\ 
(10) fil } 1; wi) = Co( 1 — wiv) - | J 1} - exp (-tr J 1 ~_ 26:71: —_ Ay 2), 


2 2 ) _ a ales 7 
c being x 4?” * 7 (2—2t1) {r(2= e a II r(2#=<+ + ye 
2 2 a 


and 


: —q+ _ = acai ' 
(11) fale) = PP (2—s+}) ir (2 20 ye (1 — 252)??? 


Hence zs and V; are independently distributed; i.e. V; and ze 2Z2 are independently 
distributed. Note that Z222 =1—-R=(w'w- wiw:)/(1 — wiw:) forl — w’y = 
1-yVry = 1/i+ y'Vo'y). Hence the distribution of R can be easily ob- 
tained by the use of the integral (2.8) in (11). By integrating over wy, in (10), 
it can be easily shown that the distribution of V; (and so of S,) is non-central 
Wishart with (n + 1) d.f. and one (1 is the rank of 4) non-central parameter. 


Corotuary 5. Let Sip X p, %i:p X 1, 7% = 1, 2,--- m, be independently 
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distributed as Wishart (n, p; 2; S) and multivariate normals (y;, =) and let 


a @ q 
Ti-i ‘ Su Sy Kiet 
i= ’ S= 3 . ? a ’ 
L2-i] p—a Sa S22/ p—¢ M2-i] p—a 
q 
Zu 2 


2 ’ -1 2 s—1 
= ’ Agi = wii Dirm« and Ay.; = wit wi- 
, ae 
“21 “22/ p—q 
q P-?@ 


If 


2 


i-1 ~1 
- Ll + fs (su > a f1-j tis) fi 
As = 4a:, thn Ri = mene entangle enemas 


t—1 —l ? 
L+a (s + > 02)) zi 


j=l 
. ov Y / ° ° 
+= 1, 2,--- mand S, = S + 2 as 2; 2; are mutually independent and their 
distributions are given by const. R;"~?** "(1 — R,)°?*?” i = 1, 2, --- mand 
non-central Wishart (m + n, p, t; 2, u; Sm) with t(tis the rank of w = (ui, -+- Bm) ) 


non-central parameters. 
The proof can be obtained in a similar manner as above. 


Note: During the time of revision, the author has obtained the most general 
form of Theorem II which yields Theorem I as a corollary. 
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A NOTE ON THE STOCHASTIC INDEPENDENCE OF FUNCTIONS 
OF CRDER STATISTICS 


By Geraup 8. RoGrers 
University of Arizona 


The theorem presented below appears to be useful in determining whether 
certain functions of order statistics are stochastically independent (briefly, 
independent). The following result has appeared in the literature in various 
forms, eg. [1]; the statement here is the particular form used in the proof of the 
second part of the theorem. 

Lemma: Let s be a complete sufficient statistic for a family of probability density 
functions indexed by a parameter 6. Let t be any cther statistic, not a function of 8 
alone. Then s and t are independent if and only if the distribution of t does not 
depend on 0. 

TuHroremM: Let x be a real random variable with distribution function F(x) = 
fi. f(X) dX, where f(x) is a non-degenerate probability density function (pdf). 
Let x, S te S +++ S Xp be the order statistics based on a random sample of size 
n = 2 from this x distribution. Let z = z(x,, +++ , 2;) be a statistic based on the 
first j <n order statistics only. Then the following two statements are equivalent: 

(a) zis independent of x, for some k = j; 

(b) z ts independent of the set {xx:7 Sk S ni}. 

Proor: Notation—let g(A){g(A|C)] denote the ordinary [conditional] pdf of 


A [given C]. To show that (a) implies (b), first suppose that in (a), k = j. 
It follows directly from the definition of conditional pdf’s that 


g(r, sit » £5\2;) _ g(x, —— » £|2;, ~ 7 » Xn), 
and hence that 
g(ziz;) = g(zla;, +++, Xn). 
Under the hypothesis (a), g(z\z;) = g(z), and therefore, g(z\z;,---, %) = 
9\2\2;j f 
g(z).’ Thus, z is independent of the set in (b). 

Now suppose that in (a), k > j. Then, (as is readily shown by direct com- 
putation), in the conditional pdf g(a. , --- , te-s|%%), 2» may be considered as a 
‘‘narameter” for which the conditional random variable x;,_, given 2 , written 
(2x-1|2%), is a “complete sufficient statistic.’”” Under the hypothesis (a), g(z|a.) = 
g(z), so that the distribution of z given 2, actually does not depend upon the 


‘“narameter” x, . Therefore, by the lemma, (z|z,) and (az-1|a,) are independent. 
In terms of the pdf’s, 


g(z, Xealte) = g(z\re)g(teslze). 


Since g(z\zx) = g(z), g(z, i+, Te) = G(z)g(te+, Ze), and hence, g(z, 241) = 
g(z)g(at-1). 


Received July 15, 1958; revised July 6, 1959. 
1 One referee pointed out that this is well known in the theory of Markov Chains. 
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This is a sufficient condition that z and x,_, be independent. By repeated appli- 
cation of this process, it follows that z is independent of zx; . But as shown above, 
this is also a sufficient condition that z be independent of the set in (b). This 
completes the proof that (a) implies (b). That (b) implies (a) is evident. 

The proof of an analogous theorem wherein y = y(x;, --* , Zn) is independent 
of the set {x;:1 < 7 S j} is similar. Both theorems will also hold with respect to 
the sets {w,,---, w,} and {w;,---, wa} under a strictly monotone trans- 
formation w = M(x). Moreover, since the theorems hold for the order statis- 
ties x} S --- S x* obtained by sampling from a uniform distribution over 
(0, 1), and since the transformation z* = F(x) is independent of the choice of 
points in intervals to which F assigns zero probability [2], it follows that both 
theorems will hold under the weaker hypothesis that F(x) is continuous. The 
truth or falsity of the theorems in the discrete or mixed cases remains an open 
question. 

The author wishes to thank Professors A. T. Craig and R. V. Hogg under whose 
direction the theorem was evolved as part of a doctoral dissertation at the State 
University of Iowa. 
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CORRECTION NOTES 


CORRECTIONS TO 
“CONTINUED FRACTIONS FOR THE INCOMPLETE BETA FUNCTION” 
By Leo A. AROIAN 
Hughes Aircraft Company 
On page 218, lines 13 and 14, of this article (Ann. Math. Stat., Vol. 12(1941), 
pp. 218-223), replace be, and be; , by 
(ot 6 ~ Ie 8). > 
(p+ 2s —2)(p+2s—1)1-—2 





be, = aes 


and 
_s(p+qt+s—1) x 
(p + 28 — 1)(p+ 2s) 1 —2 

On page 220, line 2, replace -—1 <r < ~,by —-e <x<l. 

On page 222 the statement is made that J.,(2.5, 1.5) could not be done by 
Miiller’s continued fraction. This is incorrect. Both continued fractions may be 
used for the range of 0 < x < 1, and in this range both continued fractions 
equal J,(p, @). 

On page 222 at the second line of Section 6, eliminate the words ‘“‘due to the 
possible divergence of the series on which it is based.” The rest of this para- 
graph is correct and both continued fractions may be used for /,(p, q). 





Doest - 


RR 


CORRECTIONS TO 
“SEQUENTIAL DECISION PROBLEMS FOR PROCESSES WITH 
CONTINUOUS TIME PARAMETER-TESTING HYPOTHESES” 


By A. Dvorerzky, J. Krerer, aNnD J. WoLrow1Tz 


The following corrections should be made on p. 259 of the above-titled paper 
(Ann. Math. Stat., Vol. 24(1953), pp. 254-264): The mean occurrence time 
is I'/A, not A, on line 4 of Section 4. In (4.2) the plus sign should be a minus sign. 


So 


CORRECTIONS TO 
“DISTRIBUTION OF THE MAXIMUM OF THE ARITHMETIC 
MEAN OF CORRELATED RANDOM VARIABLES” 


By JoHn GURLAND 


Towa State University 
It has been called to my attention by P. R. Krishnaiah and M. M. Rao that 
the multivariate Gamma distribution with constant correlation between the 
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components is not unique. Although the particular case employed in the paper 
(Ann. Math. Stat., Vol. 26 (1955), pp. 294-300) is stated unambiguously on the 
second page it is hoped that the following changes on the first page will help in 
avoiding any possible misinterpretation: 

(i) Page 294. First sentence of summary; line 2. Insert the words “a particular 
case of a” before the words ‘“‘multivariate analogue”’. 

(ii) Page 294. Last sentence. Remove period at end of sentence and add the 
following: ‘‘and a special case of the multivariate analogue of the Pearson Type 
III distribution represented by (2).” 

The following corrections are also kindly pointed out by Krishnaiah and Rao: 

(iii) Page 294. Line 6 from bottom. Replace p by p’. 

(iv) Page 295. Equation (2) is valid for \ = n/2 but not for all \ > 0. This 
does not affect the validity of the results obtained in the paper since the infinitely 
divisible distribution in (4) is valid for all \ > 0. 


ion eres oc 


CORRECTIONS TO 


“APPROXIMATION AND GRADUATION ACCORDING TO THE 
PRINCIPLE OF LEAST SQUARES BY ORTHOGONAL 
POLYNOMIALS” 


By CHARLES JORDAN 


The following corrections should be made in the above-titled paper (Ann. 
Math. Stat., Vol. 3(1932), pp. 257-357): 
On page 335, instead of 
m+1 


Cas 


2m+1 and limC,. = 2m +1, 


s=0 N=o 
it should read 
m+1 
> | Cme| = 2m+1 and lim|C,.| = 2m +1. 
a=) N=o 


2 
On p. 356, (32) should be 184756. In the original the last number is incorrect. 


a 


CORRECTIONS TO 


“QUASI-RANGES IN SAMPLES FROM AN EXPONENTIAL 
POPULATION” 


By Pav. R. Riper 
Wright Air Development Center 


In the paper cited in the title (Ann. Math. Stat., Vol. 30(1959), pp. 252-254), 
p. 253, fourth display, the exponent of the factor e should be —2,4; — (r + 1)2,_, 
instead of —rz,_, . I thank Mr. George E. Bardwell for pointing this out. 
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The journal in reference [6] should be Annals of the Institute of Statistical 
Mathematics instead of Ann. Math. Stat. 


nn 


CORRECTIONS TO 
“ON BALANCING IN FACTORIAL EXPERIMENTS” 


By B. V. SHan 
University of Bombay 
In the paper cited in the title (Ann. Math. Stat., Vol. 29(1958), pp. 766-779), 
on p. 766, lines 23-26, the sentence should read as follows: ‘““The set up assumed 
is that yield of a plot in the jth block having ith treatment is u + a; + t; + €;, 


where y is over all effect, a; is the effect of the jth block, ¢; is the effect of the 
ith treatment and e;; is the experimental error.” 


On p. 776, line 2, change “any contrast’ to “any normalised contrast’. 


On p. 777, in equations (7.7), (7.8) and (7.9), change “(—1)"™ 
to “(—1)**”, 


I am indebted to a referee of a subsequent paper for pointing out these cor- 
rections. 


cn 


CORRECTION TO 
“A TABLE FOR COMPUTING TRIVARIATE NORMAL 
PROBABILITIES” 
By GrorcE P. Steck 
Sandia Corporation 


The following correction should be made to the paper of the above title (Ann. 
Math. Stat., Vol. 29 (1958), pp. 780-800): 
Pages 790-799: replace “m” by “h’’ in the table headings. 


eR 


CORRECTIONS TO 
“A GENERALIZATION OF THE GLIVENKO-CANTELLI 
THEOREM” 
By Howarp G. Tucker 


University of California, Riverside 


The paper cited above (Ann. Math. Stat., Vol. 30 (1959), pp. 828-830) con- 
tains several errors for which corrections are given below. 
Inequality (4) should read 
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k k 
(4) Le Pa(Xsara)la, < F(x) < LF a(Xin — 0)Iu, 
j= j= 


Inequality (6) should be replaced by 


F(z|5) — F.(x) Ss a (F(X — 0|3) — Fa(Xj-are)) Ia, 


7 


Do (F(X — 015) — F(Xsa4|3))Ia, 


j=l 


ll 


(6) k 
+ 2, (F(Xja4e!3) — Pa(Xjau))Lu, 
= 
S max | F,(Xjx) — F(X |3) | + 1/k. 
lsisk 
Inequality (7) should be replaced by 
(7) F(#|3) — F(x) 2 —max | F,(Xj. — 0) — F(X — 0) 13) | — I/k. 


lsisk 


Inequality (8) should be replaced by 


| Fa(z) — F(x|3)| S 1/k + max {| F(X; — 0) 
(8) lsjsk 
— F(X jx — 0|5) |, |Pa(Xe) — P(X | 35) |}. 


Immediately after inequality (8) the following sentence should be added: 
In a way similar to the proof on the bottom of page 829 one may easily verify 
that P[F.:(Xj, — 0) 2 F(X, — 0|3)) = 1. 


(mm 


CORRECTION TO 
“ON THE THEORY OF BAN ESTIMATES” 


By Rosert “.. W1JSMAN 
University oj Illinois 


I am greatly indebted to Dr. Lucien LeCam for calling to my attention an 
error in the proof of Theorem 1 of the paper cited in the title (Ann. Math. Stat. 
Vol. 30 (1959), pp. 185-191). The transition from (12) to (13) is in general 
not justified. Worse, the theorem itself is false in general, as can be shown with 
a counter example. In order to remedy the situation, the assumptions have to be 
strengthened. This can be done either on the distributions of the Z, , or on the 
estimator 6. As an example of the first, if the Z, have densities which (when 
normalized) converge a.e. to the limiting normal density, then the transition 


1 Work supported by the National Science Foundation. 
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from (12) to (13) is valid [9] and with that the proof of Theorem 1 is correct. 
However, this seems too strong an assumption to be of much practical value, 
since so many examples deal with discrete random variables. Turning now to 
assumptions on 6, we could require (4) to be true for all sequences Z, for which 
(1) holds. Taking then in particular Z, : N(¢(@), 2(@)/n), the previous case 
(convergence of densities) applies, and the conclusion of Theorem 1 follows. 
A more attractive, even though slightly stronger, assumption on 6 is to require 
it to be differentiable in every point of U. This insures, of course, continuity in 
every point of U but not continuity in a neighborhood of U, leave alone differ- 
entiability in a neighborhood of U which would be the requirement for a regular 
(1) estimate. We are thus led to the following modification of Definition 2 
and regular (2): 

DeFinition 3. 6 will be called regular (3) if (i) 6(¢(6)) = @ identically 
in 6°; (ii) 6 is differentiable in every point ¢(@) of U. 

Let the matrix derivative of 6 in the point ¢(@) be denoted by A(@). Theorem 
1 now follows immediately by differentiation of (i) of Definition 3 (which is the 
same as equation (2)). A few remarks about A(@) are in order. In the first place, 
the existence of this derivative in every point of U implies (4) for every sequence 
Z,, satisfying (1). Secondly, it is not necessary to require A to be continuous in 
6. However, if 6 is constructed according to Theorem 2, then A = (BV)"“B 
(see eq. (6)) so that A is continuous due to the continuity assumptions on B 
and V. Under all circumstances, the A corresponding to any BAN estimate is 
continuous since it is given by A = (V’S"'V)"V’S". 

It is somewhat remarkable that Theorem 2 remains true if, in the conclusion, 
regular (2) is replaced by the stronger regular (3). The surprise is that 6 turns 
out to be differentiable in each point of U, even though no differentiability 
assumptions are made on B. Therefore, a proof of Theorem 2, with regular (2) 
replaced by regular (3), seems to be in order. Before doing this, it may be of 
interest to point out that Ferguson’s estimates [5] are also differentiable in 
each point of U since they are generated by (5) with B(z, @) satisfying even 
stronger assumptions than in Theorem 2. Comparing now the various kinds of 
regular estimates, we have that regular (1) estimates are continuously differ- 
entiable in a neighborhood of U, Ferguson’s estimates are continuous in a 
neighborhood of U and differentiable in every point of U, while regular (3) 
estimates are differentiable in every point of U. 

Proor OF THEOREM 2, with regular (2) replaced by regular (3). It suffices 
to show that in each point of U there is a neighborhood possessing the properties 
ascribed to the neighborhood N in the conclusion of Theorem 2. Then N can be 
caken as the union of the individual neighborhoods. Consider any point of U. 
We may take this as the origin of the coordinate system in Z. For the purpose 
of the proof we may make the same transformations as in Section 4 (observe 


2 The assumption (i) of Definition 3 is the same as equation (2). Instead, we could have 
made the same assumption as in Definition 1 (i). The two assumptions are equivalent 
since @ is supposed to be continuous in each point of U. 
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that ¢~" is differentiable due to Assumption 2 (iii) and (iv)). We may suppose 
then that U is a linear subspace of Z, spanned by the first m coordinate axes, 
and that ¢ is the identity function from U to U. Thus we have identified U 
with the parameter space ©. A point u of U has its last k — m components equal 
to 0; the m-vector formed by its first m components will be written @. Let Im 
be a k X m matrix whose elements are | on the “main diagonal” and 0 otherwise. 
We can write then u = Jk»@. The transformations which we have employed 
replace in (5) ¢(@) by I:n0, and B(z, 6) by some other matrix, which, however, 
we shall again denote by B(z, 6). The matrix V(0) is replaced by Jim. Put 
B(z, 0) Inm = C(z, 0), then by assumption C(0, 0) is non-singular. Further- 
more, C is continuous in (z, @) at (0,0). Put C’B = D, then D(z, @) exists in a 
neighborhood of (0, 0), is continuous in (z, @) at (0, 0) and is continuous in 6 
for each fixed z. Let S,; X S2 be such a neighborhood, where S, is a solid k-sphere 
about z = 0 and S, a solid m-sphere about @ = 0. In addition, we may 
choose the radii 7; and re of S; and S. so that for (z, 6) e¢ S; X Se we have 
|| D(z, 8) || S re/r,. We now write (5) as 


(25) 6 = D(z, 6)z. 


For each ze S, , the right hand side of (25) is a continuous transformation of S, 
into itself. According to the Brouwer fixed point theorem [7] there is a fixed point 
of the transformation, therefore a solution 6(z) to (25). Write 


(26) 6(z) = D(z, 6(z) )z. 


For ze S,, || D(z, 6(z)) || is bounded, so 0(z) — 0 as z > 0. Hence 6 is con- 
tinuous at 0. From this we have D(z, 6(z)) — D(0, 0) as z > 0, and from (26) 
it follows then that 6 is differentiable at z = 0, with matrix derivative D(0, 0). 
This proves that on S, 6 is regular (3). In the original coordinate system the 
matrix D(0, 0) takes the form (BV)B, evaluated at some point (¢(@), 6). 
This leads immediately to (6). The last assertion in the conclusion of Theorem 
2 is proved in [3]. 


REFERENCE 
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ABSTRACTS OF PAPERS 
(Abstracts of papers not presented at any meeting of the Institute) 


6. On a x?-Test with Cells Determined by Order Statistics. HERMANN WITTING, 
University of Freiburg. (By title) 


Let X; , --- , X, be a sample of a continuous one-dimensional probability distribution 
Q(A); let Xn, +--+ ,Xn,x-1 be order statistics for given ranks r,,; with pa,j = (Ta.3 — Tn,j—1)/ 
(n +1) = py + 0(1/V/n). Let Say = (2: Xn.j-1 < 2 S Xn,;}. For testing the hypothesis 
that Q(A) belongs to an s-parametric class of probability distributions P(A, @) the test 
statistic T, = DJ: n(P(Sn.; , On) — pn.3)2/Pn.j is used, where 6, is the minimum-x?-esti- 
mate. Then if Q@(A) = P(A, 0) orQ(A) = P(A, @) — q(A)/V/n, respectively, under certain 
regularity conditions 7’, is asymptotically distributed as x? with (k — s — 1) degrees of 
freedom (and noncentrality parameter Toe g5/Ps, 4 = p-lim q(Sn.;). Using (k — 1) 
continuous functions ¢:(z), +--+ , gr-1(x), defining ¢;(X,,;) successively by ordering the 
values ¢;(X;) and defining S,,; = {z: g:(z) > ¢1(Xnx), l= 1, «++ ,j — lj es (2) S¢5(Xu.5)}, 
the same limiting behaviour of 7, holds for probability distributions in a metric space. 
The proof is based on the fact that the Q(S,,;) are jointly B-distributed (ef. J. W. Tukey, 
Ann. Math. Stat. 18(1947)529). Thereforey/n(Q(Sn.;) — pn,j) are asymptotically N (0, C) 
where C is of rank (k — 1) and coincides with the covariance matrix of the multinomial 
distribution, underlying the corresponding classical x?-test with the cells S; =p-lim S,,; . 


While having the same power, this modified x?-procedure has certain advantages over the 
classical x?-test. 


7. A Generalized Pitman Efficiency for Nonparametric Tests. HERMANN Wirt- 


TING, University of Freiburg. (By title) 


Asymptotic expressions up to terms of order n~ are given for the efficiency of the Wil- 
coxon two-sample test relative to the £- and t-tests for nearby alternatives. The first term 
is the well-known Pitman efficiency; the remaining terms are corrections for finite sample 
sizes. Efficiency values are given for finite sample sizes in the case of normal and rectangular 
distributions and comparisons with the exact values are made. In general the Wilcoxon 
test is shown to be nearly as good locally for moderate sample sizes as it is known to be 
asymptotically. A similar analysis is performed for the single-sample sign test. 


(Abstracts of papers to be presented at the Washington, D. C., Annual Meeting of the Institute, 
December 27-30, 1959. Additional abstracts will appear in the March, 1960 issue.) 


1. Some Nonparametric Problems: I. V. P. BuapKkar, University of North 
Carolina and University of Poona. (By title) 


Mood and Brown have considered a nonparametric test for the equality of row effects 
in the two-way classification with one observation per cell or the same number of observa- 
tions per cell. In this paper, first their test has been extended to cover incomplete block 
situations. For the BIBD in the »<1al terminology, if m; denotes the number of observa- 
tions, for the ith ‘treatment’, that exceed the respective ‘block’-medians, then to test the 
equality of ‘treatment’-effects we have (k*(k — 1))/(a(k — a)Av) Diu (m: — (ra/k))? 
asymptotically distributed as a x? with» — 1 d.f.for larger, whereaisk/2ifkis even and 
(k — 1)/2 otherwise. The x? statistic appropriate for PBIBD is also given. 

Next, Hoeffding’s theorem on U-statistics extended by Lehmann to the case of two 
samples, has been extended to the case of c samples. This is then applied to derive a new 
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test for the problem of c samples. The test criterion is in terms of the number of c-plets 
that can be formed by choosing one observation from each sample such that the observation 
from the kth sample is the least (kK = 1, 2, --- ,c). 


2. Some Nonparametric Problems: II. V. P. BHarxkar, University of North 
Carolina and University of Poona. (By title) 


Mood and Brown have considered some simple nonparametric regression problems. 
In this paper, their methods are extended to discuss some additional regression problems. 
Next some bivariate analysis of variance problems are considered. The step-down procedure 
is used to reduce the problem to one involving conditional univariate distributions, the 
other variate being regarded as a concomitant random variate. The regression methods 
developed earlier are used here in these bivariate problems. The method seems to be per- 
fectly general and could be extended to the general multivariate situation. 


3. On the Foundations of the Theory of Testing Hypotheses (Preliminary re- 
port). ALLAN Birnpaum, New York University. 


For testing between simple hypotheses H; ,i = 1, 2, an experiment is called simple if 
it is equivalent, in the sense of the theory of comparison of experiments, to one observation 
on X, where Prob[X = 1| H;] = p; , Prob[(X = 0| Hi] = qi = 1 — pi, i = 1, 2, with p,’s 
known, 0 S p; S pe S 1. If various experiments are possible for a given testing problem, 
and if one of these is selected by use of a definite random device unrelated to the hypoth- 
eses, the over-all procedure is called a mixture of experiments. It is proved that under 
minor restrictions every experiment is equivalent to a mixture of simple experiments called 
its components. The possible decompositions into components are characterized and shown 
to be not essentially unique, except for simple experiments, whose components are equiva- 
lent to the given experiment. It follows that customary interpretations of error-probabilities 
of a test, as indicators of strength of evidence provided by a test outcome, require critical 
and constructive revision which leads to a modified Neyman-Pearson theory in which the 
likelihood function holds a central position as a consistently interpretable primitive indi- 
cator of evidence relevant to hypotheses. Wald’s sequential test is given an elementary 
justification on these terms as a technique for informative inference. 


4. Unbiased Sequential Estimation for Certain Two Parameter Problems (Pre- 
liminary report). B. Bratverb, University of Western Ontario, I. Cuor- 
NEYKO, University of Alberta, anp T. V. Narayana, University of Al- 
berta. 


The probability of a coin falling head is p:(0 < p; < 1), if in the previous trial the out- 
come was tail and p2(0 < p2 < 1), if in the previous trial the outcome was head. At the 
first trial the probability of a head is p, . Using a technique devised by one of the authors, 
sufficient partitions are obtained for a wide class of simple closed regions. The results of 
M. H. DeGroot (Ann. Math. Stat., Vol. 30 pp. 80-102) are shown to generalize, with the 
proper modifications, to the two parameter case. Estimable functions are explicitly given, 
and completeness of sampling plans proved for various regions. An analogue of the necessary 
and sufficient conditions of Lehmann and Stein for simple closed regions is being studied. 


5. Mathematical Models for Ranking from Paired Comparisons. H. D. Brunx, 
University of Missouri. (By title) 


Several models are discussed in each of two categories: (I) Each possible ranking of items 
is assumed to have a “‘utility’’ (for some segment of the community) which depends on 
the expected scores of the items in paired comparisons. Special instances are models in which 
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‘‘worth’’ of an item is defined in terms of its expected scores in comparisons with other 
items. (II) Each item is assumed to have an intrinsic worth; these intrinsic worths determine 
the expected scores. 

The concept of ‘‘regularity”’ is introduced. Let the expected scores of Item A be at least 
as large as those of Item B. A utility is regular if under these conditions every ranking in 
which Item A precedes Item B has at least as great utility as one in which they are inter- 
changed. This concept specializes to rankings based on worths. A necessary and sufficient 
condition is given in order that a linear utility may be regular. 

In the second category a ‘‘minimum assumption”’ model is introduced and discussed. 
Let e(u, v) denote the expected score of an item of worth u when compared with one of 
worth v. The assumption is: e(u, v) is non-decreasing in u, non-increasing in v. 


6. Asymptotically Optimal Stopping Rules in Sequential Analysis (Preliminary 
Report). Herman Cuernorr, Stanford University. 


It is desired to decide sequentially whether the mean y of a normal distribution with 
known variance is positive or negative. Suppose that an a priori distribution is given for u 
which has positive density at 1 = 0. Suppose also that the loss due to coming to the wrong 
conclusion is given by r(u) = k | w»| + 0(1), asu— 0. Finally suppose that the cost of sam- 
pling c — 0. For the optimal sequential procedure the main contribution to the Bayes risk 
is given by those values of » which are of the order of magnitude of c'’*. 

The optimal stopping rule is approximated by the solution of the analogous continuous 
problem involving a Wiener process. This problem in turn is reduced to the solution of a 
free boundary problem involving the heat equation. A method of constructing this boundary 
is proposed. 


7. Cross-Compounded Distributions. Ricnarp A. Epstern anp Luoyp R. 


We tcu, Jet Propulsion Laboratory, California Institute of Technology. 
(Introduced by L. A. Zadeh) 


It is known that the generating function of the compound Poisson distribution has the 
property that the Poisson variable can be expressed as the sum of two or more independent 
variables. A particular method of ‘“‘cross-compounding”’ two distributions is suggested ; the 
same property obtains. In theory, any two distributions can be cross-compounded to pro- 
duce a third, unique distribution. However, frequently the mathematics become overly 
involved so that it is necessary to select the distributions with discretion. Examples are 
given wherein the negative binomial distribution is cross-compounded with the Exponential 
distribution and with the Poisson distribution. Other combinations are also suggested 
which might lend themselves to cross-compounding. 


8. Examples of Two Independent Separable Processes Whose Sum Is Not 
Separable. T. Fercuson, Princeton University. 


Two examples are given of two independent stochastic processes, X; and Y; , both of 
which are separable in the sense defined in Doob’s book, and yet whose sum Z; = X, + Y;, 
is not a separable process. All processes considered are measurable. In the first example, 
X, is a constant (i.e. non-random) function, while in the second example, X; and Y; are 
identically distributed. 


9. On the Exactness of the Missing Plot Procedure in a Randomized Block 
Design. J. L. Fotxs anp D. L. West, Texas Instruments Incorporated. 


The randomized block design is said to be an unbiased design in that it allows unbiased 
estimates of treatment differences, an unbiased estimate of the error variance and an un- 
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biased test of treatment differences. This claim can be justified by assuming that the ob- 
servations are generated by the model y;; = wu + b; + t; + ej where e;; ~ N(O, o*). It can 
also be justified by considering the population of conceptual yields arising from all possible 
randomizations. In the case of a missing plot, the exact procedure described by Yates gives 
unbiased estimates of treatment differences, an unbiased estimate of the error variance, 
and an unbiased test of treatment differences. However, it is based only on the normal 
model, not upon randomization theory. The authors examine the missing plot procedure 
from the standpoint of randomization theory. The finite population of conceptual yields is 
examined where (1) the same block-treatment combination is always missing, and where 
(2) the same block-plot combination is always missing. In both cases, unbiased estimates 
of treatment differences are given by the usual estimates. The estimate of error variance 
and the test for treatment differences are unbiased in (1) under a restriction slightly weaker 
than homogeneity but appear not to be in (2) for any reasonable restriction. 


10. First Emptiness of Two Dams in Parallel. JosepH M. Gant, Columbia Uni- 
versity. 


The paper considers the probabilities of first emptiness of two dams in parallel, both 
subject to steady releases at a constant unit rate, and fed by discrete Poisson inputs of 
unit size which are directed to the dam with lesser content. The problem is shown to be 
equivalent to that of the single dam fed alternately by the two ordered inputs 
0S a,8 S1(a+8= 1); starting with an initial content z, the probabilities of first empti- 
ness of the process beginning with an input a, at the times 


T=++([(m + 1)/2la + [n/2]6 (n = 0, 1, 2, ---) 


are given by ga(z, 7’) = e~“ ifn =0,andg.(z,T) = e™{ pital 9s(ja+ j8 —B,[(n+1)/2]a 
+ [n/2]B) (A2)#-4/(2j — 1)! + ZEN" gaia + j8, [(n + 1)/2la + [n/2\8) (Az)*#/(2 9)!} ifn = 
1,2,--- , where gg(z, z + [(k + 1)/2]8 + [k/2]a)(k = 0,1,2, ---), the analogous proba- 
bility beginning with an input 8, is given by an interchange of @ for a in the previous 
equation. These probabilities may be evaluated recursively. A more convenient method 
is found by reducing the process to an associated occupancy problem, when the proba- 
bilities can be obtained bv a rapid computational procedure. Generating functions of the 
probabilities are derived, and the paper concludes with a general formulation of the dam 
problem when the times of arrival for two ordered non-negative inputs of random size 
form a Poisson process. 


11. Stochastic Approximation and “Minimax” Problems. L. A. Garpner, Jr., 
MIT Lincoln Laboratory. (By title) 


With the exception of the Robbins-Monro and Kiefer-Wolfowitz processes, the tech- 
nique of stochastic approximation does not appear to have found a range of application 
consistent with the generality of its formulation (for exposition see C. Derman, ‘‘Stochastic 
approximation’’, Ann. Math. Stat., Vol. 27 (1956), pp. 879-886). In this paper we consider 
such an iterative scheme designed to estimate the minimum of a curve which is not a re- 
gression function but the a.s. supremum of an observable random variable depending upon 
a parameter. The range of the parameter is a known finite interval, and the possibility of 
the solution being a boundary point is admitted. ‘‘Deterministic’’ conditions of the usual 
kind are imposed. The procedure is formally a truncated Kiefer-Wolfowitz process with 
the estimate of slope calculated from observed largest values in samples whose size tend to 
infinity as the iteration proceeds. Convergence with probability one is insured if this num- 
ber increases sufficiently fast, or equivalently the differencing interval decreases suffi- 
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ciently slow, relative to a measure of the amount of probability in left neighborhoods of 
the function to be minimized. Although it is easy to argue the existence of such a measure, 
it cannot be assumed that anything is known concernin, its (finite) value. This difficulty 
is resolved by having the differencing interval to be used for obtaining the next iterant 
depend in an appropriate way on a sample of largest values at the present. Estimates of 
convergence rates are made and optimum values obtained for certain constants of the 
process. Examples show the applicability of the theory to diverse problems. 


12. Some Asymptotic Results for a Coverage Problem. Max Hauperin, Knolls 
Atomic Power Laboratory. (By title) 


Let A; , A: ,--- , 4x be a random sample from a population with probability density 
p(4),0 S AS Ay, Ay finite. The set of line segments corresponding to the A; are cast 
on the interval (0, L), L = nAy , in such a way that every admissible configuration of the 
segments is equally likely. A configuration is admissible if (a) there is no overlapping of 
segments with each other. (b) there is no overlapping of segments with 0 or L. Now suppose 
a line of length \ is cast at random in the interval (0, L),’ < L; i.e. y, the coordinate of 
the midpoint of the line of length d is distributed uniformly on (A/2; L — (A/2). We define 
fractional converage, F, as the fraction of the line of length \ which is covered by the seg- 
ments of length A; , A: , --- , 4, and consider the probability distribution of F asn, L— « 
and nu/L— V where x = EAand0 < V < 1. It is shown that Pr{F = 0} = (1 — V) exp — 
(VA/(1 — V)p), Pr{F = 1} = V/p fn™ (y — \)p(y) dy, if\ > Ay andin zeroifX 2 Ay; 
for 0 < F <1, there are further (continuous) contributions to the cumulative probability 
which unfortunately are critically dependent upon the nature of p(A). One can show that 
EF = V independently of the specific nature of p(A) for\ > Ay but the variance is a com- 
plex function of p(A) which is not simply expressible even for specific p(A). It can be shown 
that for large A, F is normally distributed with mean V and variance » V (1 — V)*({1 + o*/y*]/A 
where o? = HA® — y?. 

The above work was motivated by the need for a plausible graduation function to fit 
the distribution of Boron Carbide intercepted by neutron paths (Boron Carbide is used 
to control reactor power output). Although the above assumptions are quite naive relative 
to the actual complexity of the problem, preliminary experimental data suggests that 
use of the results to match a graduation function to two moments may adequately describe 
observed frequency distributions. 


13. Polya Type Distributions of Convolutions. SamveL Kar.in, Stanford Uni- 


versity, AND FRANK ProscHan, Sylvania Electric Products, Inc., Mt. 
View, California. 


This paper obtains several useful new theorems concerning successive convolutions of 
Polya frequency densities, such as: If f; , fe , --- are density of non-negative random vari- 
ables with each f; a Polya frequency density of order k, then g(n, z) = fifa --- *fa(z) (the 
n-fold convolution) is Polya type of order k in the variables n and z, where n ranges over 
the positive integers and z traverses the positive real line. More generally, the following 
theorem is derived: Let f: , fz , --- be a sequence of Polya frequency densities of order k 
for corresponding general real valued (not necessarily positive) random variables X; , 
X:,-++.Thenh(n, z) = P[X@iX; 2 z; ViurX, < 2z,j = 1,2,--- ,n — 1) is totally 
positive of order k in n and z, n ranging over the positive integers and z over the positive 
axis. Applications of these theorems are given in inventory theory, probability, and mathe- 
matics. 
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14. A New Proof of the Continuity Theorem of Probability Theory. Emanve. 
Parzen, Stanford University. (By title) 


The continuity theorem states that if a sequence of characteristic functions ¢,(t) con- 
verge to a characteristic function g(t) at each real ¢, then the corresponding distribution 
functions converge, F,, (x) — F (x) at all continuity points z of F. The presently known proofs 
of this theorem are not constructive, but rather involve compactness arguments. This 
paper gives a new constructive proof of the continuity theorem, based on the observation 
that ions g(x) dF,(z) — on g(x) dF (x) for any bounded continuous function g with in- 
tegrable Fourier transform. Details of the proof are given in Chapter 10 of my book Modern 
Probability Theory and its Applications, John Wiley, New York, 1960. 


15. A New Inversion Formula. Emanuev Parzen, Stanford University. (By 
title) 


Let g be a bounded integrable Borel function of a real variable which possesses right and 
left hand limits at every real z. Let g*(z) = {g(x + 0) + g(a — 0)}/2. Let 


y(u) = (1/2r) fru e-™9(x) dz. 


Then for any distribution function F (with corresponding characteristic function ¢) 


Qe g* (x) dF (x) = limy,. froa — (|u|/U))y(u)e(u) du. The proof is given in Chapter 
9 of my book Modern Probability Theory and its Applications, John Wiley, New York, 1960. 


16. A Law of Large Numbers for Dependent Random Variables. EManvet Par- 
ZEN, Stanford University. (By title) 


Let X, , X: , --- be random variables with zero means and uniformly bounded variances. 
Let Zn, = (Xi +--+ + X,)/n. Let C, = E[X.Z,]. Quadratic Mean Law of Large Numbers. 
Z,— 0in mean square as n — © if andonlyifC,—-0Oas n— «. Strong Law of Large Num- 
bers. Z, — 0 with probability one if C, = 0(n*) for some positive g. These results generalize 
some of the known laws of large numbers for orthogonal and stationary sequences of random 
variables. The proof is based on the identity n*E[Z%,] + Di: E[Xi] = 2 Dis kC; . Details 
are given in Chapter 10 of my book Modern Probability Theory and its Applications, John 
Wiley, New York, 1960. 


17. Inference in Stochastic Processes I: Testing Composite Hypotheses (Prelim- 
inary report). M. M. Rao, Carnegie Institute of Technology. 


Let {z(t), te T} be a (real) stochastic process where 7’ is a linear Borel set. For any n, 
let t; < 4 < +++ < t, be in D, a dense subset of 7, and fy,,---.4,(@y, >. t, 3 9), Or fn(Z, 6) 
say, be the finite dimensional density function (w.r.t. Lebesgue meas.), of the process, 
which depends on @ = (6, , --- , 0%), k being independent of n. Suppose the testing problem 
consists of the hypotheses Hy : @ € w. vs. Hi . 9 €w, (based on one realization), where wa 
and w, are closed disjoint subsets of the (real) Euclichian k-space. Assume the following 
conditions on the densities: (a) for all n, the carriers of f,(z, @) remain invariant for all @ 
in Q = w. + w, , and f, are Baire densities, (b) if 6, , and # in Q are distinct, then f,(z, 
0:) # fn(x, 62) a.e., and (c) if £(@) is any distribution function (¢ f.) on Q which assigns 
positive probability to both w, and w, , then (fn4:(z, 6) to¥, (x, @) d&(0) — fa(x, 0) fafuss 
(x, 0) dé(@)), for any @ in w(= w, or w,), is either non-negative or non-positive for all n. 
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Theorem: If {x(t), te T} is a real separable stochastic process without fixed points of discon- 
tinuity and with the finite dimensional density functions f,(x, 0) satisfying the conditions 
(a)—(e), then, for a sufficiently large number of observations on the process at t; of D, there 
exisis an essentially unique Bayes solution, relative to an a priori distribution £(@) on Q, 
satisfying (c), for testing the composite hypotheses Hy against H, . Instead, 6, being a vec- 
tor of k components, may depend on ¢ (or k or n). Then, if the condition (c) is suitably 
modified, an analogous result obtains. Some applications are considered. 


18. Testing of Hypotheses on Categorical Data. S. N. Roy, University of North 
Carolina, anp V. P. BHapKar, University of Poona. 


In an earlier paper, we have posed hypotheses, which might be considered to be generali- 
zations, appropriate to the categorical data (structured or unstructured), of the usual 
hypotheses in classical ‘normal’ univariate and multivariate analysis of variance and in 
analysis of various kinds of ‘normal’ association. The large sample tests for some such 
hypotheses have been offered earlier and for most of the rest are offered here. The theorem 
on minimum xj is proved along Cramér’s lines and an independent justification for Ney- 
man’s ‘linearization’ technique is given. It is also shown that for linear hypotheses the 
minimum x} is exactly the same expression as the minimum sum of squares obtained by the 
‘general least squares’ approach to a model involving some asymptotically normal vari- 
ables. 


19. On Tests of Certain Types of Hypotheses Involving the Dispersion Matrices 
of Two or More Multivariate Normal Distributions and the Associated 
Confidence Bounds. 8S. N. Roy, University of North Carolina, anp R. 
GNANADESIKAN, Bell Telephone Laboratories. 


Ss; . . . 
For N (Se, ‘) (¢ = 1, 2), one of the authors derived several years ago, on a certain 


principle, a test for Hy : 2; = Zz against H: 2; * 2, with an acceptance region mw: S all 
ch (S,Sz") S ws , where S; and S, are the sample dispersion matrices, and also the associated 
confidence bounds. In this paper the same principle is used to derive tests for Hy : 2; = Zz 
against the respective alternatives (i) H: all ch (223°) > 1, (ii) H: all ch (325°) < 1, 
(iii) H: (i) U (ii), (iv) H: at least one ch (2,23") > 1 and (v) H: at least one ch (2:%3°) <1. 
The associated confidence bounds are also obtained and interpreted, and finally, a partial 
generalization of these results are made to the case of k populations, with regard to both 
testing of hypotheses and confidence bounds. 


20. On the Monotonic Character of the Power Functions of Two Multivariate 
Tests. 8. N. Roy anp W. F. Mixuan, University of North Carolina. 


The power function of the largest root test of normal multivariate linear hypothesis on 
means or of independence between two sets of variates involves, in each case, aside from 
the degrees of freedom, certain non-negative, non-centrality parameters. This paper sup- 
plies a relatively simple and compact proof that the power function monotonically in- 
creases as each parameter, separately, increases—a result that was conjectured and proved 
(but not published) by one of the authors several years ago by a very lengthy and laborious 
method. It is believed that, with suitable and slight modifications, the method used here 
should prove useful in proving or disproving similar results in a wide variety of problems 
in testing of hypotheses involving multivariate normal distributions. 





1278 ABSTRACTS 


21. Confidence Bounds for an Integral Function of an Estimate with Applications 
to Reliability Theory. Sam C. Saunpers, Boeing Scientific Research Lab- 
oratories. 


Let X and Y be independent random variables with distributions F e F and G «SG, 
respectively, where F and G are subsets of the class of continuous distributions on given 
positive sample spaces X and ‘YY. Let w be a homeomorphism from ‘Y into X and define 
H(w) = J{F(w) dG. From samples X,,--- , X, and Yi,-+--, Ym we form F, and G,, , esti- 
mates of F and G, respectively, and define A, the empirical estimate of H, by A(w) = 
{F.(w) dG,, for w € &, a class of homeomorphisms linearly ordered by H. 

We are interested in problems associated with this phenomenological interpretation. 
For some device: let w(Y) be the taxation on life under usage w and let X be the capacity 
for endurance. Then H(w) = P[X < w(Y)] is the unreliability and H(w) is an estimate 
of this unreliability. Using H to determine a maximum usage &, what is the probability 
the unreliability H (@) is too large? We define & so that H (a) is distribution-free re F¥ X G 
or obtain a stochastic bound majorizing H (@) for each (/’, G@) e ¥ X G under the assumption 
F (F-*) is distribution-free re ¥ and similarly for G, G. This provides an answer in one im- 
portant application and the theory is developed so that many such reliability problems 
can be treated. 


22. A Rank Sum Test for Comparing all Pairs of Treatments. Roserr G. D. 


‘ 


Sree., Cornell University. 


Consider a permutation of n:X,’s, --- , neX,’s with n, S --- S nx arising from ordering, 
from smallest to largest, observations on k treatments. Assign ranks 1, --- , ni + n; to 
the observations on all possible pairs of treatments and sum the ranks assigned to the 
observations on the treatment with lower subscript. This gives a test criterion denoted by 
(Ti. , +++ , Tie, Tos, +++ , Te-1.2). A recursion formula is developed for computing prob- 
abilities and is used to show, by induction, that u(Ti;) = ni(n¢ + nz + 1)/2, 0° (Ti;) = 
nin; (ns + nj + 1)/12, o(TiT 3) = nanin;/12 = o(T,;Ti;), o(TiT iz) oS —nanin;/12 and 
o(T.Ti;) = 0. From the distribution of (Ti: , --- , Tx-1,2), the distribution of min{T7;;} 
can be obtained. Several such distributions are computed for a common value of n. These 
provide critical values for a non-parametric multiple comparison rank sum test. 


23. Asymptotic Expansions for the Mean and Variance of the Serial Correlation 
Coefficient. Joun 8. Wuire, Aero Division, Minneapolis Honeywell Regu- 
lator Co. 


Following the procedure used by W. J. Dixon (Ann. Math. Stat., 1944, pp. 119-144) 
series expansions are obtained for the first two moments of @ = = 2,;%-:/D zi_, where 
(z,) is a first order auto-regressive Gaussian process with parameter a. The series expan- 
sions are carried out to terms of order 7~* and a‘ thus extending the asymptotic results 
of several authors. 

The results are obtained for both the stationary and fixed initial variate case. 





NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Professor Robert Bechhofer, having spent his sabbatic leave at Stanford Uni- 
versity, has returned to the Department of Industrial and Engineering Adminis- 
tration, Sibley School of Mechanical Engineering, Cornell University. While at 
Stanford, Dr. Bechhofer held an appointment as Visiting Professor of Statistics 
in the Applied Mathematics and Statistics Laboratory and in the Department 
of Preventive Medicine. 

Richard E. Beckwith received a Ph.D. degree in Mathematical Statistics from 
Purdue University in May, 1959. He has been a senior research engineer at the 
Jet Propulsion Laboratory, Califernia Institute of Technology, Pasadena, Cali- 
fornia, following his resignation from the staff of Purdue University in September, 
1958. 

Allan Birnbaum has been appointed Associate Professor of Mathematical 
Statistics in the Department of Mathematics and the Institute of Mathematical 
Sciences at New York University. 

Austin J. Bonis received his Ph.D. from the George Washington University in 
June 1959, and is now continuing his work as Deputy Director, Analysis Division, 
in the Office of the Assistant Secretary of Defense for Manpower and Personnel 
in the Department of Defense. Dr. Bonis has prepared a solution book to Dr. 8. 
Kullback’s Information Theory and Statistics (J. Wiley). A limited number of 
copies are on sale at the George Washington University Book Store at $6.25 each. 

Ralph A. Bradley has joined the faculty of The Florida State University at 
Tallahassee, Florida. He will be Chairman of the Department of Statistics, at 
the University. During the past nine years, Dr. Bradley has been Professor of 
Statistics and consultant to the Agricultural Experiment Station at the Virginia 
Polytechnic Institute. 

Beginning in September 1959, Dr. Irwin D. J. Bross will head the Department 
of Statistics at Roswell Park Memorial Institute. 

Professor A. Clifford Cohen, Jr., has been named Director of the newly estab- 
lished University of Georgia Institute of Statistics in Athens, Georgia. 

Martin Fox has accepted a position as Assistant Professor in the Department 
of Statistics at Michigan State University. 

Irwin Guttman has been appointed Associate Professor of Mathematical 
Statistics, Department of Mathematics, McGill University. 

Mr. Robert H. Hoskins will receive the title of Associate Group Actuary, 
effective September 1, 1959. 

Richard C. Kao, Senior Mathematician, System Development Corporation, 
Santa Monica, California, is now a Senior Associate, Planning Research Corpora- 
tion, Los Angeles 24, California. 
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Margaret P. Martin has recently joined the staff in the Department of Bio- 
statistics of the School of Hygiene and Public Health of Johns Hopkins Univer- 
sity as Associate Professor. She was previously with the State University of New 
York. 

Dr. Hugh J. Miser, formerly Deputy Assistant for Operations Analysis at Hq 
US Air Force, has joined the staff of the Research Triangle Institute of Durham, 
North Carolina, as Head of its Operational Sciences Laboratory. 

M. C. Pike has been appointed to the post of assistant in Mathematical Sta- 
tistics. 

Dr. K. C. 8. Pillai has resumed his duties at the Statistics Office of the United 
Nations at New York after spending more than three years in the Philippines 
where he was United Nations Senior Statistical Adviser in Mathematical Sta- 
tistics and Visiting Professor of Statistics at the Statistical Center, University 
of the Philippines, Manila. 

William K. Robinson has taken the position of Vice-President and Actuary 
with the First National Life Insurance Company of Phoenix, Arizona. 

The Board of Directors of the Mathematical Centre at Amsterdam has advised 
us of the death of David van Dantzig, Professor at the University of Amsterdam, 
Member of the Board and Head of the Departments of Mathematical Statistics 
and Applied Mathematics of the Mathematical Centre. He passed away at the 
age of 58 years on July 22nd 1959 after a sudden heart attack. 

James K. Yarnold has recently joined the staff of General Analysis Corpora- 
tion, Arizona Office. 


RR 


NEW MEMBERS 
The following persons have been elected to membership in The Institute 


May 7, 1959, to August 10, 1959 


Barr, David R., M.S., (Miami University); Student, U. S. Air Force Institute of Tech- 
nology, Wright-Patterson AFB, Ohio, with duty station State University of Iowa, 
Iowa City, Iowa; Bor 608, Iowa City, Iowa. 

Basmann, Robert L., Ph. D., (Iowa State College); Operations Research Analyst, Hanford 
Laboratories Operation, Operations Research and Synthesis Operation, General Electric 
Company, Hanford Atomic Products Operation, Richland, Washington. 

Bhat, Beliyar Ramdas, M.A. (Karnatak University) Lecturer in Statistics, Karnatak 
University, Dharwar, India; Department of Statistics, University of California, Berkeley 
4, California. 

Bobb, James C., B.S., (Carnegie Institute of Technology); Statistical Engineer, Betz 
Laboratories, Inc., Gillingham and Worth Streets, Philadelphia 24, Pa. 

Booker, Aaron H., M.A., (North Texas State College); Graduate Student, Iowa State 
University, Ames, Iowa; 598 Pammel Court, Ames, Iowa. 

Bradford, Clarence H., M.A., (University of Chicago); Associate Director, Army Institute 
Project, The University of Chicago, 5757 South Drexel Avenue, Chicago 87, Illinois. 

Brown, Bradford S., M.8., (University of Illinois); Service Engineer, Engineering Depart- 
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ment, E. I. du Pont de Nemours and Company, Buffalo Avenue and Chemical Road, Ni- 
agra Falls, New York. 

Bryson, Marion R., Ph.D., (Iowa State College); Research Associate, Duke University 
Office of Ordnance Research, Durham, North Carolina; Bor EM, Duke Station, Durham, 
North Carolina. 

Charles, Gerald T., B.S., (Roosevelt University); Statistician, Remington Rand Univac, 
1902 West Minnehaha Avenue, St. Paul, Minnesota; 1402 North Dupont Avenue, Min- 
neapolis 11, Minnesota. 

Chen, Robert J.T., M.S., (Oklahoma State University); Graduate Student, Oklahoma 
State University, Stillwater, Oklahoma; Statistical Laboratory, Oklahoma State Uni- 
versity, Stillwater, Oklahoma. 

Chu, Herbert H., M.S., (Oklahoma State University); Graduate Student, Statistical Labo- 
ratory, Cklahoma State University, Stillwater, Oklahoma. 

Denny, John, L., Jr., B.A., (Stanford University); Research Assistant, University of 
California, Berkeley, California; 470 Panoramic Way, Berkeley 4, California. 

Douglas, Alan W., M.Sc., (McGill University); Assistant, Biometrics Unit, Department of 
Plant Breeding, Cornell University, 337 Warren Hall, Ithaca, New York; 187 Northview 
Road, Ithaca, New York. 

Dubey, Satya D., B.SC., (Patna University); Special Graduate Research Assistant, Depart- 
ment of Statistics, Michigan State University, East Lansing, Michigan. 

Eicker, Friedhelm, Dr.rer.nat., (University of Mainz); Research Associate, Department of 
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MEETING OF ISI 


The 32nd Session of the International Statistical Institute will be held in Tokyo 
from May 30 to June 9, 1960. The program covers a wide variety of topics in 
theoretical and applied statistics. Detailed information about registration, pro- 
gram, etc. may be obtained from either Mr. Masao Goto, Organizing Committee 
of the 32nd I.S.I. Session, % Statistical Standards Bureau Administrative Man- 
agement Agency, 5 Sannen-cho, Chiyoda-ku, Tokyo, Japan, or Permanent Office, 
International Statistical Institute, 2 Oostduinlaan, The Hague, Netherlands. 


(en 


NATIONAL BUREAU OF STANDARDS POSTDOCTORAL 
RESEARCH ASSOCIATESHIPS 


Research associateships, supported by the National Bureau of Standards and 
awarded on recommendations of the National Academy of Sciences—National 
Research Council, are offered to provide young investigators of unusual ability 
and promise the opportunity for basic research in various branches of the physical 
and mathematical sciences. It is expected that approximately 20 awards may be 
made in a total of twenty-nine fields, of which the following are of particular 
interest to members of the institute: Pure and Applied Mathematics, Applied 
Mathematical Statistics, Numerical Analysis. These research associateships are 
open only to citizens of the United States, and in the foregoing fields are tenable 
only at the National Bureau of Standards in Washington, D. C. Applicants must 
have received (or completed the requirements for) a Ph.D. or Sc.D. degree, or 
the equivalent, in one of the fields listed above at the time of entering upon the 
research associateship. 

The annual gross stipend will be $7510 and will be subject to income tax. 
Travel and moving expenses of the Research Associate and his family from piace 
of residence to Washington, D. C. wiil be paid by the National Bureau of Stand- 
ards. Awards will be made about April 1, 1960. Unless otherwise arranged the 
tenure of a research associateship may begin after July 1, 1960 and continue for 
one year, with provision for a vacation period. 

Requests for application forms or for additional information should be ad- 
dressed to the Fellowship Office, National Academy of Sciences—National Re- 
search Council, 2102 Constitution Avenue, N. W., Washington 25, D. C. 
Applications for the academic year 1960-1961 must be received in the Fellowship 
Office no later than February 1, 1960. 


a 


FELLOWSHIP AND RESEARCH OPPORTUNITIES 


The Division of Mathematics, National Academy of Sciences—National 
Research Council, has published a leaflet listing some private and government 
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agencies that offer support of mathematical study and research at the graduate 
and post-graduate levels. Copies of the leaflet are available upon application to 
Division of Mathematics, National Academy of Sciences—National Research 
Council, 2101 Constitution Avenue, Washington 25, D.C. 


<a 
NEW EDITORIAL STAFF FOR MTAC 


The Division of Mathematics of the National Academy of Sciences—National 
Research Council announces that Harry Polachek, Technical Director of the 
Applied Mathematics Laboratory of the David Taylor Model Basin, has been 
appointed Chairman of the Editorial Committee for the quarterly journal Math- 
ematical Tables and Other Aids to Computation effective January 1959. He succeeds 
C. B. Tompkins of the University of California at Los Angeles, who held the post 
since November 1954. The other members of the Editorial Committee are: C. C. 
Craig, A. Fletcher, E. Isaacson, D. Shanks, C. V. L. Smith, A. H. Taub, C. B. 
Tompkins and J. W. Wrench, Jr. 

Articles for publication in Mathematical Tables and Other Aids to Computation 
should be addressed to Harry Po’ *hek, Editor, Mathematical Tables and Other 
Aids to Computation, David Taylor Model Basin, Washington 7, D.C. 

Information on subscriptions may be obtained from National Academy of 
Sciences, Printing and Publishing Office, 2101 Constitution Avenue, N. W., Wash- 
ington 25, D.C. 


rR 


FLORIDA STATE UNIVERSITY DEPARTMENT OF STATISTICS 


The Florida State University at Tallahassee, Florida has established a Depart- 
ment of Statistics effective July, 1959. The initial faculty will consist of Ralph A. 
Bradley, Chairman, John L. Bagg and Lonnie L. Lasman. Programs of study 
leading to the Bachelor of Science and Master of Science degrees in statistics 
will be initiated in the Fall 1959 semester and advanced graduate work will be 
developed in the near future. The Department of Statistics will provide univer- 
sity-wide training and consulting. Inquiries regarding the program will be wel- 
comed and some assistantships will be available for graduate students. 


Re 


KANSAS STATE UNIVERSITY DEPARTMENT OF STATISTICS 


July 1, 1959, Kansas State University at Manhattan, Kansas, established a 
new Department of Statistics. A Statistical Laboratory was organized at Kansas 
State in 1946 and will continue as a consulting and computing unit sponsored by 
the Agricultural Experiment Station. 

The Department of Statistics presently has a five-man staff: Robert S. 
Cochran, Arlin M. Feyerherm, H. C. Fryer, Gary F. Krause, and Stanley 
Wearden, plus graduate and research assistants. A Master’s degree in statistics 
has been offered for a number of years through the Department of Mathematics. 
It is planned that a Ph.D. in statistics will be offered soon. 
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RUTGERS STATISTICS CENTER 


Rutgers University has established a Statistics Center which will be the State 
University’s central unit for research and teaching in the field of statistics. It 
will continue to offer at Rutgers the Master’s Degree program and in addition 


will have responsibility for a program of study leading to the Ph.D. in Applied 
and Mathematical Statistics. 


The Center, which is under the administration of Dr. Marion A. Johnson, 
dean of the Graduate School, will be directed by Dr. Ellis R. Ott, who has pre- 
viously served as professor of Mathematics and Chairman of the Mathematics 
Department of University College, as well as Chairman of the Rutgers program 
in Applied and Mathematical Statistics. Director of Research at the Center will 
be Dr. Martin B. Wilk. The staff will also include Dr. Roger 8. Pinkham, Dr. 
Mason E. Wescott and Harold F. Dodge. 


The Statistics Center will be responsible for programs of research in statistical 
theory and methodology and in ways of promoting the more effective use of 
statistics in science, industry, education and other areas. 

The Center will also make available, both within and without the University, 
consultative services with respect to the use of statistics and statistical tech- 
niques in research or other experiments and surveys, and with respect to the 
analysis and presentation of data. 

Since 1952, Rutgers has had course work leading to a Master’s degree in the 
field of Applied and Mathematical Statistics. Through June, 1959, a total of 85 
Master’s degrees in applied and mathematical statistics had been awarded, 
mostly to full-time employees in nearby industries. The present development 
constitutes a major extension of the Rutgers Statistics Program plus an adminis- 
trative reorganization. 

The establishment of the Statistics Center is predicated on the belief that a 
group within the University charged with instructional, research, and consulta- 
tive functions in the field will both advance and strengthen the University’s 
program in statistics, and also further the effective use of statistical tools and 
techniques by the many members of the University staff who, though not spe- 
cialists in statistics, must of necessity use statistics in their own work. 


eee 
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Annuario Estadistico de Espatia, Edicion Manual, Presidencia del Gobierno, 
Instituto Nacional de Estadistica, Ferraz 41, Madrid, Spain, 1959, 977 pp. 
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ing Theory, Stanford University Press, Stanford, California, 1959, 432 
pp., $11.50 

Cahiers du Séminaire D’Econométrie, No. 5, Production, investissements et pro- 
ductivité, Editions du Centre National de la Recherche Scientifique, 13 Quai 
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Statistical Handbook, 1958, Central Statistical Office, Peoples Republic of 
Bulgaria, Sofia, 1959, 223 pp. 
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